Sunday, September 27, 2015

Program Sharing: My Experience Submitting to CRAN

R is an open source statistical programming environment that we have discussed here on the blog before. This is the statistical software of choice for most scientists because it is freely available, incredibly powerful, and backed up by a strong, enthusiastic community. In addition to being openly available to anyone, it is also open to your programming contributions. Contributions to R are most often made as packages, which are defined as the following:



"[R] packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data."  
 ~Hadley Wickham (R Packages)

So this means that if you have a great idea for some R functions, you can wrap them up in a package and make them easily available for everyone else in the world to use. Packages are most commonly stored in the Comprehensive R Archive Network (CRAN). This network makes your code easily accessible to anyone by allowing direct download through the R command line or even with a few clicks in R studio. This also ensures the packages will be a relatively high quality because CRAN maintains high standards for submission, and tests every package multiple times everyday to check for errors.

At one point during my analysis in lab, I developed a workflow that I thought could be useful for other people. This was a set of functions that allowed the user (our project team at the time) to generate microbiome profiles for patients over time. We used this to monitor microbiome community signatures as patient wound healed after a traumatic wounding event (see the resulting paper from this project). Other people thought this was pretty cool so we went ahead and started wrapping it up into a package. Because it was for generating patient profiles, we decided to call it patPRO (you can check out patPRO on GitHub or CRAN).

Although it was great helping people with their own analysis by creating this package, I was equally motivated by using this as a learning experience. I wanted the experience of taking a package from an idea to a published result. In the end, I thought this was a great opportunity, and I wanted to share it with you in case you also want to give it a try.

Hosting Your Project on GitHub

When you start collecting your package documents, be sure to collect them in a GitHub repository. As I mentioned before on this blog, GitHub will provide you with solid version control and a great medium for sharing your project, pre- and post-development. Especially do this for the version control, because there will be a lot of moving parts and you are going to want to manage your changes as you develop the code.

Building the Package

Your package is basically going to be a directory (which will eventually be compressed into a Tarball) with defined file and subdirectory components. R studio will actually help you create an R package with the required components, but here is an overview in case you need a little more help.

  • ./R will hold the R scripts defining your functions. All of the functions can be in one file, separate files, or somewhere in between. 
  • ./data will contain your example dataset, which will be used for examples and unit testing. 
  • ./inst includes installed files, including additional external data for examples.
  • ./man includes the manual files which include the package documentation in R markdown format. This also includes examples for code testing.
  • ./DESCRIPTION is a text file with the information describing the package.
  • ./NAMESPACE is another text file that is a little confusing. It basically lets your package play well with others. I primarily used it for managing dependencies, which is pretty basic, but it also has more advances applications that I will leave you to read about on your own.


Get Ready to Learn R

I think that one of the best ways to learn a programming language is to dive in with a project that forces you to keep learning more. This package can be that project! This was a huge learning experience for me because it gave me a chance to keep coding in R in new ways that I might not have otherwise considered. This is a learning experience, so get ready to learn!

Get Ready to Learn Better Coding Practices (Including Unit Testing)

Not only will you learn more about R coding, but you will also learn to adopt and appreciate better coding practices. This includes unit testing your code and following specific guidelines for more efficient programming and documentation. The CRAN submission process holds you to a high standard, so you will be forced to write unit tests for everything (which will be run every day on multiple operating systems) and heavily document your code. It is a pain at first, but you soon learn to appreciate its utility, and I bet you will even start doing this in the rest of your programming projects!

Run R Development Checks Before Submitting

CRAN will hold your package to a high standard before it is accepted, but they give you tools to help with the process. The most important process is checking your package against their test scripts. This will look for almost all types of formatting and coding errors, as well as deviations from standard documentation. You will likely see a lot of errors, warnings, and notes, which is totally cool. Just go ahead and fix them until every single test passes with "ok". Once you get this done, go ahead and submit to CRAN.

In the R command line, type:

devtools::check("/PATH/TO/R-PACKAGE/")

If you are on a Mac, be sure to also check it against R for Windows:

devtools::build_win("/PATH/TO/R-PACKAGE/")

Don't Be Intimidated

I admit that I was nervous to submit something to CRAN. I have heard some people say that it was a terrible experience for them, including a lot of personal attacks. All I can really say though is that my experience was entirely good. I got some feedback upon my first submission, but it made the code better and was not aggressive. I fixed it up, sent the package back in, and everything was good to go. Now I know my package was relatively basic compared to others, but it was a great experience, and yours probably will be too. So stop worrying and go for it!

So in the end, I had a great experience building an R package and submitting it to CRAN. I learned so much and am a better coder as a result. If you have a great idea and some time to spend on package development, go for it and embrace the learning experience!

And as always, feel free to reach out with any questions, corrections, or concerns. I would love to hear from you. Leave a comment below, Tweet me, or shoot me an email.

No comments:

Post a Comment