The neotoma package for R.

The neotoma package has been available for a little while (at figshare here and at github here).  I wrote it up and posted it in the middle of April before my talk at the University of Minnesota, but I’ve been delaying writing about it here until it was a bit more polished.

So here is your introduction to the neotoma package for R, hosted & supported by ROpenSci (after the break):

The neotoma package is intended to interface with the Neotoma Paleoecological Database.  The database hosts paleoecological data spanning the Pliocene-Quaternary, originally from two major paleoecological databases, the North American Pollen Database and FAUNMAP.  Currently Neotoma supports working groups for a number of different paleoecological datasets, and is working toward integrating more data into the database as we speak.

In the meantime, authors, including myself and Jessica Blois have been using the Neotoma database to understand patterns of change in deposition rates and community composition and dissimilarity (respectively) over time.  In our papers we had to download the entire Neotoma database and then do analysis.  I was hoping to make this process easier by creating a package for R that would allow investigators to download and analyse data directly through APIs, producing a reproducible workflow, facilitating preliminary data analysis, and potentially providing an opportunity for educators to use Neotoma data as a learning tool in courses as part of lab work.

Figure 1.  The distribution of pollen sample sites with Spruce pollen dating from the Last Glacial Maximim, 21,000 years ago (21ka),
Figure 1. The distribution of pollen sample sites with Spruce pollen dating from the Last Glacial Maximum, 21,000 years ago (21ka),

To begin to use the neotoma package you need to do a few things first:

  1. Download the compressed package file to your computer and then use install.packages to install it as a package in R, or if you’re using RStudio, install it from there.

Okay, that was one thing.  Look at how easy that was!

Once you’ve installed the package you just have to open the package library(neotoma) and you’re off to the races. I’m going to show you two examples to help get you started, the first showing the distribution of sites with Spruce pollen detected around the Last Glacial Maximum (at 21ka), and the second showing the number of publications in Neotoma by year.  Both are fairly simple analyses, but hopefully highlight some of the potential applications of the package and the Neotoma database.

library(ggplot2)
library(neotoma)
library(maps)
#  Lets make a base map of North America for display:
all.world <- map_data('world')
northam <- subset(all.world, region %in% c('Canada', 'USA', 'Mexico'))

#  Now lets find all the records of Spruce (Picea) at the last glacial maximum from
#  the neotoma database:

lgm.picea <- get_datasets(taxonname='Picea', ageold=22000, ageyoung=20000)
loc.fromset <- function(x) data.frame(lat = x$Site$LatitudeNorth, long = x$Site$LongitudeWest)
lgm.sites <- ldply(lgm.picea, loc.fromset)

#  Just want to get rid of the European sites:
lgm.sites <- lgm.sites[lgm.sites$long < -20,]

ggplot() + geom_polygon(data = northam, aes(x = long, y = lat, group = group), fill = 'white', color='gray') +
  geom_point(data = lgm.sites, aes(x = long, y = lat)) +
  annotate('text', x = -180, y = 25, label='Picea Pollen Distribution', hjust=0, size=8) +
  annotate('text', x = -180, y = 20, label='Last Glacial Maximum\n22 - 20ka', hjust=0) +
  theme_bw() + xlim(-180, -50) + ylim(14, 80)

And now, here’s the number of publications in Neotoma, by year (including vertebrate fossils, pollen and ostracods):

pubs <- get_publication()
pub.years <- as.numeric(as.character(pubs$Year))

ggplot(data=data.frame(x = pub.years), aes(x)) +
  stat_bin(aes(y=..density..*100, position='dodge'), binwidth=1) +
  theme_bw() +
  ylab('Percent of Publications') +
  xlab('Year of Publication') +
  scale_y_continuous(expand = c(0, 0.1)) +
  scale_x_continuous(breaks = seq(min(pub.years, na.rm=TRUE), 2013, by=20))
Figure 2.  The distribution of publication years in the Neotoma database.
Figure 2. The distribution of publication years in the Neotoma database as a percent. There are a total of 3693 publications in Neotoma, but 32 publications have non-numeric publication years.

The interesting thing about Figure 2 is that there is a clear peak in the publication age range at 1984. This is more than likely associated with an artifact of the database. Neotoma is currently in its first version, but there are a large number of datasets in the holding tank, waiting for the next update of the Neotoma database. We will be adding a flag into the neotoma package soon so that the user can define which version of the database to access. This will help support reproducible research by allowing supplementary data to reference a specific “snapshot” of the database, so that code samples will be numerically reproducible into the future.

I welcome everyone to try out the package and explore the Neotoma database. This is my first time making a package, so if you find bugs, have any problems, or have suggestions to improve the package please let me know, either in the comments here, on twitter or by email. This package may have a very specific niche, but that doesn’t mean it can’t be usable!

Published by

downwithtime

Assistant scientist in the Department of Geography at the University of Wisconsin, Madison. Studying paleoecology and the challenges of large data synthesis.

3 thoughts on “The neotoma package for R.”

  1. Thanks for the post and the package. They are very useful! I have two small notes about getting this code to run. First, in line 6 of the first chunk of code “all_world” should be “all.world” and in line 16 of the same chunk of code “<” should be “<".

    1. Thanks, I’ll fix it. WordPress keeps reformatting the code blocks, so I’ve rewritten this post a few times. Keep me updated as to how you’re getting along with the code!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s