Reproducibility and R – Better results, better code, better science.

I made a short presentation for our most recent weekly lab meeting about best practices for reproducible research.  There are a few key points, the first is that the benefits of reproducible research are not just for the community.  Producing reproducible code helps you, both after publication (higher citation rates: Piwowar and Vision, 2013) but in the long run in terms of your ability to tackle bigger projects.

Lets face it, if you intend to pursue a career inside or outside of academia your success is going to depend on tackling progressively larger or more complex projects.  If programming is going to be a part of that then developing good coding practice should be a priority.  One way to get into the habit of developing good practice is to practice.  In the presentation (PDF, figShare) I point to a hierarchy (of sorts) of good scientific coding practice, reproducible programming helps support that practice:

  1. An integrated development environment (IDE) helps you organize your code in a logical manner, helps make some repeatable tasks easier and provides tools and views to make the flow of code easier to read (helping you keep track of what you’re doing)
  2. Version control helps you make incremental changes to your code, to comment the changes clearly, and helps you fix mistakes if you break something.  It also helps you learn from your old mistakes, you can go back through your commit history and see how you fixed problems in the past.
  3. Embedded code helps you produce clean and concise code with a specific purpose, and it help you in the long run by reducing the need to “find and replace” values throughout your manuscript.  It helps reviewers as well.  Your results are simply a summary of the analysis you perform, the code is the analysis.  If you can point readers and reviewers to the code you save everyone time.

So, take a look at the presentation, let me know what you think.  And, if you are an early-career researcher, make now the time to start good coding practice.

Writing and collaborating on GitHub, a primer for paleoecologists

At this point I’ve written a hundred times about the supplement for Goring et al., (2013), but just in case you haven’t heard it:

Goring et al., (2013) uses a large vegetation dataset to test whether or not pollen richness is related to plant richness at a regional scale.  Because of the nature of the data and analysis, and because I do all of my work (or most of it) using R, I thought it would be a good idea to produce totally reproducible research.  To achieve this I included a version of the paper, written using RMarkdown, as a supplement to the paper.  In addition to this, I posted the supplement to GitHub so that people who were interested in looking at the code more deeply could create their own versions of the data and could use the strengths of the GitHub platform to help them write their own code, or do similar analyses.

This is a basic how-to to get you started writing your paper using RMarkdown, RStudio and GitHub (EDIT: if some of these instructions don’t work let me know and I’ll fix them immediately):

Continue reading Writing and collaborating on GitHub, a primer for paleoecologists