June 6 – 7: Second Annual Macrosystems Biology PI Meeting, plenary speaker, Washington D.C. [Link]
The neotoma package for R.
The neotoma package has been available for a little while (at figshare here and at github here). I wrote it up and posted it in the middle of April before my talk at the University of Minnesota, but I’ve been delaying writing about it here until it was a bit more polished.
So here is your introduction to the neotoma package for R, hosted & supported by ROpenSci:
The neotoma package is intended to interface with the Neotoma Paleoecological Database. The database hosts paleoecological data spanning the Pliocene-Quaternary, originally from two major paleoecological databases, the North American Pollen Database and FAUNMAP. Currently Neotoma supports working groups for a number of different paleoecological datasets, and is working toward integrating more data into the database as we speak.
In the meantime, authors, including myself and Jessica Blois have been using the Neotoma database to understand patterns of change in deposition rates and community composition and dissimilarity (respectively) over time. In our papers we had to download the entire Neotoma database and then do analysis. I was hoping to make this process easier by creating a package for R that would allow investigators to download and analyse data directly through APIs, producing a reproducible workflow, facilitating preliminary data analysis, and potentially providing an opportunity for educators to use Neotoma data as a learning tool in courses as part of lab work.

Figure 1. The distribution of pollen sample sites with Spruce pollen dating from the Last Glacial Maximum, 21,000 years ago (21ka),
To begin to use the neotoma package you need to do a few things first:
- Download the compressed package file to your computer and then use install.packages to install it as a package in R, or if you’re using RStudio, install it from there.
Okay, that was one thing. Look at how easy that was!
Once you’ve installed the package you just have to open the package library(neotoma) and you’re off to the races. I’m going to show you two examples to help get you started, the first showing the distribution of sites with Spruce pollen detected around the Last Glacial Maximum (at 21ka), and the second showing the number of publications in Neotoma by year. Both are fairly simple analyses, but hopefully highlight some of the potential applications of the package and the Neotoma database.
# Lets make a base map of North America for display:
all.world <- map_data('world')
northam <- subset(all_world, region %in% c('Canada', 'USA', 'Mexico'))
# Now lets find all the records of Spruce (Picea) at the last glacial maximum from
# the neotoma database:
lgm.picea <- get_datasets(taxonname='Picea', ageold=22000, ageyoung=20000)
loc.fromset <- function(x) data.frame(lat = x$Site$LatitudeNorth, long = x$Site$LongitudeWest)
lgm.sites <- ldply(lgm.picea, loc.fromset)
# Just want ot get rid of the European sites:
lgm.sites <- lgm.sites[lgm.sites$long < -20,]
ggplot() + geom_polygon(data = northam, aes(x = long, y = lat, group = group), fill = 'white', color='gray') +
geom_point(data = lgm.sites, aes(x = long, y = lat)) +
annotate('text', x = -180, y = 25, label='Picea Pollen Distribution', hjust=0, size=8) +
annotate('text', x = -180, y = 20, label='Last Glacial Maximum\n22 - 20ka', hjust=0) +
theme_bw() + xlim(-180, -50) + ylim(14, 80)
And now, here’s the number of publications in Neotoma, by year (including vertebrate fossils, pollen and ostracods):
pubs <- get_publication()
pub.years <- as.numeric(as.character(pubs$Year))
ggplot(data=data.frame(x = pub.years), aes(x)) +
stat_bin(aes(y=..density..*100, position='dodge'), binwidth=1) +
theme_bw() +
ylab('Percent of Publications') +
xlab('Year of Publication') +
scale_y_continuous(expand = c(0, 0.1)) +
scale_x_continuous(breaks = seq(min(pub.years, na.rm=TRUE), 2013, by=20))

Figure 2. The distribution of publication years in the Neotoma database as a percent. There are a total of 3693 publications in Neotoma, but 32 publications have non-numeric publication years.
The interesting thing about Figure 2 is that there is a clear peak in the publication age range at 1984. This is more than likely associated with an artifact of the database. Neotoma is currently in its first version, but there are a large number of datasets in the holding tank, waiting for the next update of the Neotoma database. We will be adding a flag into the neotoma package soon so that the user can define which version of the database to access. This will help support reproducible research by allowing supplementary data to reference a specific “snapshot” of the database, so that code samples will be numerically reproducible into the future.
I welcome everyone to try out the package and explore the Neotoma database. This is my first time making a package, so if you find bugs, have any problems, or have suggestions to improve the package please let me know, either in the comments here, on twitter or by email. This package may have a very specific niche, but that doesn’t mean it can’t be usable!
Focusing on the big picture: the ups and downs of writing a paper.

Figure 1. A camel with many humps. Probably a relative of Alice’s given genetics and whatnot. (Credit: flickr user bristley)
I’m in the middle of revising two papers, with a third almost ready for submission. It’s a great situation to be in, but it’s really time consuming and I feel like I’ve left some of my other projects in limbo. The process for me is like Alice the Camel, there’s a lot of humps:
- I’m pretty excited. I’ve done some analysis, enough that I feel like there’s something there. I have an idea of where I want the paper to go, and I’m busy writing out an outline with some figure ideas so that I can sort out what needs to be done.
- Erg, something’s not working, some of my results are different than the last time I looked. I’m having some issues making graphs that look reasonable.
- Awesome! The main analysis is done, I’m adding some descriptive stats which are pretty straightforward and I’ve got a good map of the discussion, I can see the end!
- Gah! Sent the paper out to co-authors and they don’t see how awesome some of the things I’ve done are. But. . .
- Sweet. Working through the co-author comments helps crystallize the paper, and make it much cleaner.
- Aw man, journal submission is a drag. Oh, and I forgot to write a cover letter. What did Dynamic Ecology say about them?
- Yes! Journal accepted the paper!
- Argh! These reviewers are asking me to do some annoying things. Resubmitting is just as annoying as submitting.
- Okay, it’s off my plate for a while.
- Yay! Publication!
- Gah, more minor changes, no problem, off again.
- Look! It’s my paper proofs!
- Arg, missed a couple typos, some changes are needed.
- Yay! Everything’s done.
- Yay! Someone’s cited my paper!
- Gah, on to the next paper. Publish or perish young researcher. . .
Something like that anyway. How common is this? What’s your favorite part of writing? Least favorite?
The big point here is that for all the ups and downs, it is important to look at the big picture. Is the paper you’re writing important? Are you, in a broad sense, excited about it? Sometimes I have to remind myself of the big picture as I slog through reviewer comments, but I’ve been lucky to consistently have reviewers who are supportive. On my first paper a reviewer’s comment on line 100-and-something was simply: ”This is really interesting.” It’s these kinds of comments that help keep things positive.
So, back to revisions. Hope you are all making progress too.
The Nature Geosciences Climate 2k Journal Club
Nature Publishing Group held a hangout on Google+ to discuss the recent paper in Nature Geosciences summarizing global climate changes over the last 2kyr (the discussed article is: Continental-scale temperature variability during the past two millennia; here).
The conversation began with Thorsten Kiefer discussing the PAGES 2K initiative, followed by Nicholas McKay discussing the main finding of the paper. Nerilie Abram followed Nicholas to discuss some of the patterns in the Antarctic Peninsula. Gavin Schmidt then discussed the importance of the research in a broader context and how the data can be used as a tool to improve and understand climate model outputs.
You can view the discussion here, but I’d like to focus my comments here on what went well, what didn’t and how useful these kinds of journal groups are for research dissemination.
Pros:
- It’s nice to see the authors discussing their points in a more informal way than the actual publication. Alicia Newton, an associate editor at Nature Geosciences, led the discussion and did a good job of hitting the main points of the paper, and some of the interesting points. In particular I thought it was great to bring Nerilie Abram in to discuss the melt record. It wasn’t a critical piece of the overall story (but important for understanding regional forcings), but was interesting for some background into paleoclimate proxies and reconstructions. Same goes for the discussion of African records by Nicholas McKay.
- It’s hard to judge how this would be perceived by a non-expert, but I thought that the presenters did a good job of presenting the paper results in a clear and straightforward manner.
- The conclusion of the Journal Club involved a look ahead, and I thought that was a good note to end on.
Cons:
- There were some issues with slides, which slides needed to come up when, but this was fairly minor.
- It’s difficult to follow who individuals are. In Google+ you can hover over people’s pictures to see who you’re talking to, but this discussion was nested in a youtube video, so hovering over people’s heads just brought up the youtube slider bar.
- Very few questions from the rabble, but luckily the presenter was well prepared and there was lots to discuss.
- Interestingly, there was little tweeting or google+ activity surrounding the discussion. I suppose it’s not a con, but it might have been helpful to have someone live tweet the discussion, perhaps it could have engaged a larger audience, and provided more questions.
Summary
This was my first Google+ journal club hangout, and it was totally worthwhile. I enjoyed hearing about the paper from the authors, it was also nice seeing the authors, rather than just imagining what they might look like(!). I’m curious how this could be harnessed outside of the aegis of a large publisher. For example, how effective would this be for a lab group? As Jacquelyn Gill has mentioned elsewhere, Google+ is a good tool for collaborating, it would be interesting to see if a semi-formal journal club could be established and maintained where the authors are invited and papers are discussed with the public. If anyone knows of something like this, let me know in the comments.
If you want to watch the recorded Journal Club, you can check this YouTube link.
UPDATE: I’ve added in Alicia Newton’s name, she is an associate editor at Nature Geosciences.
Open Science, Reproducibility, Credit and Collaboration
I had the pleasure of going up to visit the Limnological Research Center (LRC) at the University of Minnesota this past week. It’s a pretty cool setup, and obviously something that we should all value very highly, both as a resource to help prepare and sample sediment cores, and as a repository for data. The LRC has more than 4000 individual cores, totaling over 13km of lacustrine and marine sediment. A key point here is that much of this sediment is still available to sample, but, this is still data in its rawest, unprocessed form. Continue reading
A delay, lots on the horizon!
There is a special issue coming out in Climate of the Past on Holocene climate changes in the central Mediterranean (here). I’ve been involved with a number of the researchers for some time (and have commented on the serendipity of this relationship on this blog) and am pleased to have worked on several of the papers, although only one has made its way through the full review process (Joannin et al., 2013).
The papers resolve a longstanding conflict among records from the central Mediterranean, place regional Holocene climate changes into the context of global climatological systems and anthropogenic effects, explore several new multi and single proxy records from lake and marine sediments, and, ultimately, establish a synthesis. But you’ve got to wait for that a little bit longer.

Figure 1. The Cladoceran Daphnia. Cladocerans may more broadly be useful for climate reconstruction if the results of a new study hold.
In other news, hey, there’s a new paleoclimatic proxy to use! The Journal of Biogeography has an interesting early view article by Nevalainen et al. (here) about the use of Cladocrea as a climate proxy in Finland. They fit GLMs to species response curves along climatic gradients and find good fit for a number of taxa. I’m actually surprised, Hann, in the famous Methods in Quaternary Ecology, suggests that they’re not a particularly reliable climatic indicator (here) due to their strong relationships to the local aquatic environment (arguing that that local environment buffers climate change), but I’m willing to be swayed.
Once the special issue is released I’ll write more about climate in the central Mediterranean, and I’ve got good feedback on a paper we submitted recently, so I’ll be more productive on downwithtime soon. Keep reading!
If you’re going to solicit papers for a journal, use proper grammar. Lessons for predatory publishers.
Here’s a great solicitation from David Publishing Company, a company on Beall’s list of predatory publishers (I’ve reformatted it so you don’t have to see all the fonts they used):
Dear Goring, Simon J ,
This is Earth Science and Engineering (ISSN 2159-581X), a new professional journal published across the United States by David Publishing Company, Chicago, IL, USA. We have learned your paper“RELIABLE GRIDDED ESTIMATES OF PRE-SETTLEMENT VEGETATION FOR THE UPPER MIDWEST FROM PUBLIC LANDS SURVEY DATA” in the AMQUA 2012 . We are very interested in your paper . If the paper has not published in other journal ,we would like to publish your paper in our journal Earth Science and Engineering. All your original and unpublished paper are welcome (although the paper has already been published in the conference, it can also publish in our journal, becauce there is no ISSN or ISBN for the conference). If you have the idea of making our journal a vehicle for your research interests, please send electronic version of your papers or books to us through email attachment in MS word format.
Sounds great, but if you can’t reliably edit your own email solicitations then I’m not interested. Then there’s this at the end:
As an American academic publishing group, we wish to become your friends if we may.
I’m not really sure what that’s about. Are they friendly because they’re American? Because we’re both American? Do they know I’m Canadian, and if they don’t, will that hurt my chances of being their friend? How do you become friends with a publishing group anyway? So many questions. . .
Their website leads to more questions. From the “For Authors” page (blank), to the Survey on the front page (what’s it even about?). At least they provide me with the option to list myself as an “Academician” when I submit my paper. That’s the number one reason I’ve never submitted to Nature or Science, no opportunity to list my title as “Academician”.
Vegetation-climate relationships using historical climate data from the 19th Century Forts & Observer Database, expanding species realized niches
I’m presenting this week’s CPEP seminar at the Center for Climatic Research, UW-Madison (1:00pm, AOSS room 1039, 1225 W. Dayton St., Madison, WI). As before, I’ll post the slides once I get them done. Much of the material will be similar to work I’ve presented at the IBS and in my Yi-Fu seminar, but I’ve been working hard with the 19th Century Forts database over the past little while, and doing some fun things with some AWOS data that I’ll hopefully have time to present. As with all things paleo and historical, you really need to understand how the modern system works to begin to make inferences about the past that are meaningful.
Here is the talk abstract:
Historical data sets for both vegetation and climate exist, covering a time period prior to major land use conversion in the upper Midwestern United States. We aim to improve information about species fundamental niches in climate space by extending gridded climate data products to the early 1800s so that they are coincident with early estimates of pre-settlement vegetation in the American Midwest. Here I present work detailling the creation of the gridded data sets and their application to species distribution modelling to show the sensitivity of future suitability maps to added data from historical records.
UPDATE: This presentation is similar enough to my Yi-Fu and IBS talks that I’m not going to put it up on figshare.
