As an academic you have the advantage of meeting people who do some really amazing research. You also have the advantage of doing really interesting stuff yourself, but you also tend to spend a lot of time thinking about very obscure things. Things that few other people are also thinking about, and those few people tend to be spread out across the globe. I had the opportunity to join researchers from around the world at Queen’s University in Belfast, Northern Ireland earlier this month for a meeting about age-depth models, a meeting about how we think about time, and how we use it in our research.
Time is something that paleoecologists tend to think about a lot. With the Neotoma paleoecological database time is a critical component. It is how we arrange all the paleoecological data. From the Neotoma Explorer you can search and plot out mammal fossils at any time in the recent (last 100,000 years or so) past, but what if our fundamental concept of time changes?
Most of the ages in Neotoma are relative. They are derived from from radiocarbon data, either directly, or within a chronology built from several radiocarbon dates (Margo Saher has a post about 14C dating here), which means that there is uncertainty around the ages that we assign to each pollen sample, mammoth bone or plant fossil. To actually get a radiocarbon date you first need to send a sample of organic material out to a lab (such as the Queen’s University, Belfast Radiocarbon Lab). The samples at the radiocarbon lab are processed and put in an Accelerator Mass Spectrometer (Figure 1) where molecules of Carbon reach speeds of millions of miles an hour, hurtling through a massive magnet, and are then counted, one at a time.
These counts are used to provide an estimate of age in radiocarbon years. We then use the IntCal curve to relate radiocarbon ages to calendar ages. This calibration curve relates absolutely dated material (such as tree rings) to their radiocarbon ages. We need the IntCal curve since the generation of radiocarbon (14C) in the atmosphere changes over time, so there isn’t a 1:1 relationship between radiocarbon ages and calendar ages. Radiocarbon (14C) 0 is actually 1950 (associated with atmospheric atomic bomb testing), and by the time you get back to 10,000 14C years ago, the calendar date is about 1,700 years ahead of the radiocarbon age (i.e., 10,000 14C years is equivalent to 11,700 calendar years before present).
To build a model of age and depth within a pollen core, we link radiocarbon dates to the IntCal curve (calibration) and then link each age estimate together, with their uncertainties, using specialized software such as OxCal, Clam or Bacon. This then allows us to examine changes in the paleoecological record through time, basically, this allows us to do paleoecology.
A case for updating chronologies
The challenge for a database like Neotoma is that the IntCal curve changes over time (IntCal98, IntCal04, IntCal09, and now IntCal13) and our idea of what makes an acceptable age model (and what constitutes acceptable material for dating) also changes.
If we’re serving up data to allow for broad scale synthesis work, which age models do we provide? If we provide the original published model only then these models can cause significant problems for researchers working today. As I mentioned before, by the time we get back 10,000 14C years the old models (built using only 14C ages, not calibrated ages) will be out of sync with newer data in the database, and our ability to discern patterns in the early-Holocene will be affected. Indeed, identical cores built using different age models and different versions of the IntCal curve could tell us very different things about the timing of species expansions following glaciation, or changes in climate during the mid-Holocene due to shifts in the Intertropical Convergence Zone (for example).
So, if we’re going to archive these published records then we ought to keep the original age models, they’re what’s published after all, and we want to keep them as a snapshot (reproducible science and all that). However, if we’re going to provide this data to researchers around the world, and across disciplines, for novel research purposes then we need to provide support for synthesis work. This support requires updating the calibration curves, and potentially, the age-depth models.
So we get (finally) to the point of the meeting. How do we update age-models in a reliable and reproducible manner? Interestingly, while the meeting didn’t provide a solution, we’re much closer to an endpoint. Scripted age-depth modelling software like Clam and Bacon make the task easier, since they provide the ability to numerically reproduce output directly in R. The continued development of the Neotoma API also helps facilitate this task since it again would allow us to pull data directly from the database, and reproduce age-model construction using a common set of data.
One thing that we have identified however are the current limitations to this task. Quite simply, there’s no point in updating some age-depth models. The lack of reliable dates (or of any dates) means that new models will be effectively useless. The lack of metadata in published material is also a critical concern. While some journals maintain standards for the publication of 14C dates they are only enforced when editors or reviewers are aware of them, and are difficult to enforce post publication.
The issue of making data open and available continues to be an exciting opportunity, but it really does reveal the importance of disciplinary knowledge when exploiting data sources. Simply put, at this point if you’re going to use a large disciplinary database, unless you find someone who knows the data well, you need to hope that signal is not lost in the noise (and that the signal you find is not an artifact of some other process!).