Dave Parr

Posted on Jun 10, 2020 • Edited on Dec 11, 2023 • Originally published at daveparr.info

Writing R packages, fast

#rstats #packages #tools

R packages are great. R users have a rich ecosystem of extensions to
help us doing various things. We have our own integrated package
management system,
CRAN,
and we also have Metacran which gives us an
easy way to work out how popular specific packages are used. R also
makes it ‘easy’ for you to write your own packages! Sort of.

How can I create and R package?

CRAN suggests making an R package is really
simple.

5.5 How can I create an R package? A package consists of a
subdirectory containing a file DESCRIPTION and the subdirectories R,
data, demo, exec, inst, man, po, src, and tests (some of which can be
missing). The package subdirectory may also contain files INDEX,
NAMESPACE, configure, cleanup, LICENSE, LICENCE, COPYING and NEWS.

See section “Creating R packages” in Writing R Extensions, for
details. This manual is included in the R distribution, see What
documentation exists for R?, and gives information on package
structure, the configure and cleanup mechanisms, and on automated
package checking and building.

R version 1.3.0 has added the function package.skeleton() which will
set up directories, save data and code, and create skeleton help files
for a set of R functions and datasets.

So I just run the function package.skeleton(), and then I just fill in
some R code right? Well, you could. But also, please don’t. Having done
it that way myself the first few times, you can do that, but that
doesn’t mean you should. Luckily R is a programming language, and
people use programming languages to automate things. So obviously people
have built packages the slow way that will help you build your package
easier, better, faster.
We should use those instead!

Starting a project

You should be using RStudio. It’s an Open
Source IDE for R specifically,
which is free to download. The fastest way to
start your project is to open RStudio, and go to File > New Project > New Directory > R Package. Then give it a name, and make sure to create
a git repository with it.

You now have a Hello World package. It contains a function called
hello in the R directory, with a few comments and notes about short
cuts. You will be able to Build the package, and you can run a
Check, which will give you a warning, and run the Test which
will give you an error.

You’ll also have a few other files and folders. man will hold the
manual, or documentation. The DESCRIPTION gives you a bunch of
metadata, and the NAMESPACE will only have the line
exportPattern("^[[:alpha:]]+"). Don’t worry about that for now, we’ll
fix it in a moment. .Rbuildignore is a bit like the .gitignore. It
will just contain references for R to not use when running Build.

Setting up a package and managing package structure with`usethis`

I talked about usethis in my last
post,
but didn’t do it nearly the justice it deserves. The best place to start
is Usage and then have a look at the
Reference.

In my projects I tend to use these functions to set up each one:

use_pipe() to allow the package to use pipes from magrittr
use_testthat() to set up tests that for the package
use_package() to include a dependency, such as ggplot2 or readr
use_vignette() to include long form documentation
use_readme_rmd()/use_readme_md() to create Readmes
use_badge() to fill in those Readmes with badges

And then to write my package I tend to use these functions to write it:

use_r() to create a new R file to source
use_test() to create a new test for a specific open R file
use_data() to include data sets in my packages

So you can do all these things manually, and maybe the first time you
should. Maybe you should learn that to use a package you need to modify
the Imports field in the DESCRIPTION, and maybe you need to learn
that to make testthat work in your package you need to make a specific
test folder, and then include a testthat.R file to make it work.
However, I also don’t need to implement my own class every time I want
to make an object that holds a data matrix either, and I also don’t need
to know how the interpreter actually works when I run my hello
function. There’s a reason why we don’t write in 1 and 0, and it’s the
same reason I don’t see the value in writing my own NAMESPACE files
any more. I can do it, but it’s a simple boring task that I’ve done a
bunch of times before, and that at this stage I’m actually likely to get
more wrong than the computer. Talking of which…

`NAMESPACE`, self-documenting code and self-coding documentation

The NAMESPACE helps your package understand what functions it sources
from what other packages, and what functions it allows to be used. The
R packages book has a chapter
on it that goes into detail, but the key thing to really understand is
how to use it.

As an example, lets run usethis::use_pipe(). This will set-up our
package to allow us to use the %>% operator to chain functions. Our
console tells us:

● Run `devtools::document()`

We also have a new file ./R/utils-pipe.R:

#' Pipe operator
#'
#' See \code{magrittr::\link[magrittr]{\%>\%}} for details.
#'
#' @name %>%
#' @rdname pipe
#' @keywords internal
#' @export
#' @importFrom magrittr %>%
#' @usage lhs \%>\% rhs
NULL

So this is a file that’s all comments? Not quite. This is a Roxygen
comment block with #', and it has special powers. It won’t actually be
executed if this file is sourced, but if we document our package, the
process will read these comment blocks, and the @ tags, and populate
out the documentation in ./man. As we got prompted though, we need to
run the documentation process manually, so lets do that with
devtools::document().

Updating making.packages documentation
First time using roxygen2. Upgrading automatically...
Writing NAMESPACE
Loading making.packages
Writing NAMESPACE
Writing pipe.Rd

Now we have some changes. Our NAMESPACE LOOKS LIKE THIS:

# Generated by roxygen2: do not edit by hand

export("%>%")
importFrom(magrittr,"%>%")

and we have a new ./man/pipe.Rd file:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils-pipe.R
\name{\%>\%}
\alias{\%>\%}
\title{Pipe operator}
\usage{
lhs \%>\% rhs
}
\description{
See \code{magrittr::\link[magrittr]{\%>\%}} for details.
}
\keyword{internal}

What’s happened? A few things:

Roxygen was associated with our package build to parse and process the comments
Roxygen re-wrote the NAMESPACE
Roxygen read the Roxygen comments in the ./R/utils-pipe.R
Roxygen used those special comments to put an import and an export line into the NAMESPACE
- The export will allow users to use the %>% when they load our package
- The importFrom will allow us to use the %>% in our own code in the package itself
Roxygen populated the manual page for the function so we can see it in the RStudio help.

Compare the two comment blocks now. What are the differences?

Roxygen comments

#' Pipe operator
#'
#' See \code{magrittr::\link[magrittr]{\%>\%}} for details.
#'
#' @name %>%
#' @rdname pipe
#' @keywords internal
#' @export
#' @importFrom magrittr %>%
#' @usage lhs \%>\% rhs
NULL

Roxygenised `.Rd`

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils-pipe.R
\name{\%>\%}
\alias{\%>\%}
\title{Pipe operator}
\usage{
lhs \%>\% rhs
}
\description{
See \code{magrittr::\link[magrittr]{\%>\%}} for details.
}
\keyword{internal}

So the first big one is formatting. The first one looks (mostly) human
readable, while the second one has a lot more cruft. This cruft is
basically LaTeX. The order has been changed around and few of the labels
are a little different, but there is another big change. @export and
@importFrom aren’t in the documentation! Those were only included
to help Roxygen populate the NAMESPACE file. So you can see that those
two tags are actually amongst the most important things to include in
documenting other functions. If you check in the DESCRIPTION file we
now also have a new Imports field in the YAML with magrittr being
the only entry. Did you notice that there was no actual R code in this
function? It actually explicitly contained NULL. All of this was
purely generated from the documentation, but now we actually have a new
piece of fundamental functionality in our package. Weird huh?

The package skeleton’s connected to the comment bone. The comment bone’s connected to the sinew imports.

So now you’re all prepared to write your own package. You know how to
start a new package project, how to set it up with usethis and how to
document functions with Roxygen. Hold up though. That thing about not
having to do the basically the same fiddly boring job multiple times in
each package? I really meant it. The cool R kids don’t even write their
own Roxygen any more.

Go back to your ./R/hello.R file, and ditch those boring vanilla #
comments. Now source the function into your global environment so you
can run it interactively. Now install.packages("sinew"), and run
sinew::makeOxygen(hello).

#' @description FUNCTION_DESCRIPTION

#' @return OUTPUT_DESCRIPTION
#' @details DETAILS
#' @examples 
#' \dontrun{
#' if(interactive()){
#'  #EXAMPLE1
#'  }
#' }
#' @rdname hello
#' @export

So sinew has built us a template for
Roxygen, and put the function name in? Cute, but I don’t need that to
write one word for me. OK, how about something a bit more real world.
I’ve been making a wrapper for the dev.to
API
(using all these methods). If I go get my post_new_article function
and run sinew::makeOxygen(post_new_article):

#' @title FUNCTION_TITLE
#' @description FUNCTION_DESCRIPTION
#' @param file PARAM_DESCRIPTION
#' @param key PARAM_DESCRIPTION, Default: NA
#' @return OUTPUT_DESCRIPTION
#' @details DETAILS
#' @examples 
#' \dontrun{
#' if(interactive()){
#'  #EXAMPLE1
#'  }
#' }
#' @seealso 
#'  \code{\link[rmarkdown]{yaml_front_matter}},\code{\link[rmarkdown]{render}}
#'  \code{\link[readr]{read_file}}
#'  \code{\link[stringr]{str_remove}}
#'  \code{\link[glue]{glue}}
#'  \code{\link[httr]{POST}},\code{\link[httr]{add_headers}}
#' @rdname post_new_article
#' @export 
#' @importFrom rmarkdown yaml_front_matter render
#' @importFrom readr read_file
#' @importFrom stringr str_remove
#' @importFrom glue glue
#' @importFrom httr POST add_headers

This is my output. Sinew inspects my function, and then derives the
functions I need to import and adds them to the Roxygen comment block.
It also links the functions in the @seealso which will show up in the
docs. Remember that these now get automatically added to the
NAMESPACE and DESCRIPTIONS where they need to be, each time I
document my package. It’s also understood what arguments my function
takes, and populated them with the defaults, if present. Now, why would
you bother writing this by hand any more?

TL;DR

Start your package in the RStudio IDE as a new project
Set up your package with usethis
Populate your Roxygen comments with sinew
Profit!

DEV Community

Writing R packages, fast

How can I create and R package?

Starting a project

Setting up a package and managing package structure with`usethis`

`NAMESPACE`, self-documenting code and self-coding documentation

Roxygen comments

Roxygenised `.Rd`

The package skeleton’s connected to the comment bone. The comment bone’s connected to the sinew imports.

TL;DR

Top comments (0)

Read next

AI System Beats Humans at Understanding Chinese Speech Disorders, Largest Study Shows

Robot Policy Generator Creates Clear, Task-Specific Controls That Humans Can Understand

New AI Method Teaches Language Models to Cite Sources Accurately Without Manual Training

Breakthrough: Smart Networks that Understand Context Show 50% Better Performance in Heterogeneous Systems

How can I create and R package?

Starting a project

Setting up a package and managing package structure withusethis

NAMESPACE, self-documenting code and self-coding documentation

Roxygen comments

Roxygenised .Rd

The package skeleton’s connected to the comment bone. The comment bone’s connected to the sinew imports.

TL;DR

Read next

AI System Beats Humans at Understanding Chinese Speech Disorders, Largest Study Shows

Robot Policy Generator Creates Clear, Task-Specific Controls That Humans Can Understand

New AI Method Teaches Language Models to Cite Sources Accurately Without Manual Training

Breakthrough: Smart Networks that Understand Context Show 50% Better Performance in Heterogeneous Systems

Setting up a package and managing package structure with`usethis`

`NAMESPACE`, self-documenting code and self-coding documentation

Roxygenised `.Rd`