In keeping with the theme of coring pictures I wanted to share PalEON’s new video, produced by the Environmental Change Initiative at the University of Notre Dame. It does a good job of explaining what PalEON does and what we’re all about. There’s also a nice sequence, starting at about 2:40s in where you get to see a “frozen finger” corer in action. We break up dry ice, create a slurry with alcohol and then drop it into the lake into the lake sediment.
Once the sediment has frozen to the sides of the corer (about 10 – 15 minutes) we bring the corer up and remove the slabs of ice from the sides, keeping track of position, dimensions and orientation so that we can piece it back together. I’m on the platform with Jason McLachlan and Steve Jackson.
There’s a great section in there about the sociology of the PalEON project as well, although it’s brief. So take a second and watch the video, it’s great!
I’ve been coding in R since I started graduate school, and I’ve been coding in one way or another since I learned how to use Turing in high school. I’m not an expert by any means, but I am proficient enough to do the kind of work I want to do, to know when I’m stuck and need to ask for help, and to occasionally over-reach and get completely lost in my own code.
I’ve been working hard to improve my coding practice, particularly focusing on making my code more elegant, and looking at ways to use public code as a teaching tool. In submitting our neotoma package paper to Open Quaternary I struggled with the balance between making pretty figures, and making easily reproducible examples, between providing ‘research-quality’ case studies and having the target audience, generally non-programmers, turn off at the sight of walls of code.
There’s no doubt that the quality and interpretability of your code can have an impact on subsequent papers. In 2012 there was a paper in PNAS that showed that papers with more equations often get cited less than similar papers with fewer equations (SciAm, Fawcett and Higginson, 2012). I suspect the same pattern of citation holds for embedded code blocks, although how frequently this happens outside specialized journals isn’t clear to me. It certainly didn’t hurt Eric Grimm’s CONISS paper (doi and PDF) which has been cited 1300+ times, but this may be the exception rather than the rule, particularly in paleoecology.
I’m currently working on a collaboration with researchers at several other Canadian universities. We’re using sets of spatio-temporal data, along with phylogenetic information across kingdoms to do some pretty cool research. It’s also fairly complex computationally. One of the first things I did in co-developing the code was to go through the code-base and re-write it to try to make it a bit more robust to bugs, and to try and modularize it. This meant pulling repeated blocks of code into new functions, taking functions and putting them into their own files, generalizing some of the functions so that they could be re-used in multiple places, and generally spiffing things up with a healthy use of the plyr package.
This, I now realize, was probably something akin to statistical machismo (maybe more like coding machismo). The use of coding ‘tricks’ limited the ability of my co-authors to understand and modify the code themselves. It also meant that further changes to the code required more significant investments in my time. They’re all great scientists, but they’re not native coders in the same way I am (not that I’m in the upper echelon myself).
This has been a real learning experience. Coding for research is not the same as coding in an industrial application. The culture shift in the sciences towards an open-sharing model means that we are no longer doing this work just so that we get output that “works”, but so that the code itself is an output. Collaborative coding should mean that participants in the project should be able to understand what the code does and how it works. In many cases that means recognizing that collaborators are likely to have differing skill sets when it comes to coding and that those different skill sets need to be respected.
In my case, going ahead and re-writing swaths of code certainly helped reduce the size of the code-base, it meant that, in general, things ran more smoothly, and that changes to the code could be accomplished relatively easily. It also meant that I was the only one who could easily make these changes. This is not good collaborative practice, at least, at the outset.
Having said that, there are lots of good reasons why good coding practice can be beneficial,even if some collaborators can’t immediately work through changes. It’s partly a matter of providing road maps, something that is rarely done. Good commenting is useful, but more often lately I’ve been leaning toward trying to map an application with flowcharts. It’s not pretty, but the diagram I drew up for our neotoma package paper (Goring et al., submitted) helped debug, clean syntax errors and gave me an overview I didn’t really have until I drafted it.
I’m working on the same kind of chart for the project I mentioned earlier, although it can be very complicated to do. I’ve also been making an effort to clearly report changes I’ve made using git, so that we have a good record of what’s been done, and why. Certainly, it would have been easier to do all this in the first place, but I’ve learned my lesson. As in most research, a little bit of planning can go a long way.
Writing robust, readable code is an important step toward open and reproducible science, but we need to acknowledge the fact that reproducibility should not be limited to expert coders. Trade-offs are a fact of life in evolution, and yet some of us are unwilling to make the trade-offs in our own scientific practice. We are told constantly to pitch our public talks to the audience. In working with your peers on a collaboration you should respect their relative abilities and ensure that they can understand the code, which may result in eschewing fancier tricks in favor of clearly outlined steps.
If you code in such a way that people can’t work with you, you are opening yourself up to more bugs, more work and possibly, more trouble down the line. It’s a fine balance, and as quantitative tools such as R and Python become more a part of the lingua franca for grad students and early career researchers, it’s one we’re going to have to negotiate more often.
A couple of weeks ago my colleagues and I submitted a session proposal to ESA (Paleoecological patterns, ecological processes, modeled scenarios: Crossing temporal scales to understand an uncertain future) for the 100th anniversary meeting in Baltimore. I’m very proud of our session proposal. Along with a great topic (and one dear to my heart) we had a long list of potential speakers, but we had to whittle it down to eight for the actual submission.
The speaker list consists of four male and four female researchers, a mix of early career and established researchers from three continents. It wasn’t hard. We were aware of the problem of gender bias, we thought of people who’s work we respected, who have new and exciting viewpoints, and who we would like to see at ESA. We didn’t try to shoehorn anybody in with false quotas, we didn’t pick people to force a balance. We simply picked the best people.
Out of the people we invited only two turned us down. While much has been said about higher rejection rates from female researchers (here, and here for the counterpoint), both of the people who turned us down were male, so, maybe we’re past that now?
This is the first time I’ve tried to organize a session and I’m very happy with the results (although I may have jinxed myself!). I think the session will be excellent because we have an excellent speakers list and a great narrative thread through the session, but my point is: It was so easy, there ought to be very little excuse for a skewed gender balance.
PS. Having now been self-congratulatory about gender I want to raise the fact that this speakers list does not address diversity in toto, which has been and continues to be an issue in ecology and the sciences in general. Recognizing there’s a problem is the first step to overcoming our unconscious biases.
Well, I’ve finally made it into a news release for the University of Wisconsin:
The University of Wisconsin-Madison, home of pioneering ecologists who studied lakes, forests, wetlands and prairies, is playing a key role in the next wave of ecological research: large teams of scientists confronting the dilemma of a changing climate on a shrinking planet.
The article summarizes work of two NSF Macrosystems funded projects, GLEON and PalEON (obviously, borrowing on the gold standard of Neon Inc.) and features a quote from me that sounds like something I might have said slightly tongue in cheek: “We’re pollen whisperers,” erm, yeah. . .
Regardless, I like the news releases’ thread between the history of the University of Wisconsin and our modern work. As put by Jack Williams:
“Reid Bryson was one of the first to look seriously at climate change, and John Kutzbach produced a groundbreaking set of studies identifying the key causes of past climate change. Thompson Webb, my advisor at Brown, got his Ph.D. here in Madison in 1971 and has been studying paleoclimate ever since.”
Working in Science Hall I’ve always felt well connected to the history of the University, even if I’m only here temporarily. Reid Bryson, John Curtis (Bray-Curtis anyone?), Tom Webb III, and many other people central to the intersection of climate and ecology, shared these halls at some point in the last century. The walls have stayed the same but the ideas have flowed on like pine pollen on a spring breeze.
Much of our work, and the work I’m quoted on (pollen quote aside), has been deeply influenced by David Mladenoff and his lab group who have been working with Public Land Survey data for Wisconsin and the Upper Midwest for some time now. He’s been an invaluable collaborator, even if he’s not in Science Hall.
Anyway, back to prepping for our June Pollen/R course at UMaine. I’ll update soon with some R tricks that experienced users wish they had learned early on.
I’ve posted about rejection before here, and I’m pretty sure everyone on the job market knows what it’s like, but after my twoposts about the CNRS search I just thought I’d follow up to say that I didn’t get the position.
Ultimately I went in knowing that success was a long shot, but, when you’re sitting in the Jardin Botanique, eating crudité and remembering the Sainte-Maure de Touraine you had a few days ago, it’s hard not to believe that you’ve got a good shot.
Anyway, rejection sucks. Music rules, but even great music represents a lot of hard work, chances taken and a lot of rejection. I met Zach Rogue once after a show in Vancouver and he seems like a super guy who worked hard to make some great music.
So, after writing the last blog post, futzing, practicing my talk, calling home, saving my talk as a PDF, remembering semi-transparent layers in PowerPoint don’t save properly as PDF, fixing the whole talk again, and then saving the files twice, to two different memory sticks, I headed out to the UPMC. The building itself is one giant interconnected set of hallways, all raised up in the air, and all under construction (apparently), but I got to walk to it through the beautiful botanical gardens (once I got out of an alleyway I inadvertently got myself locked into).
So I arrived at the site of the interview about 20 minutes ahead of time, just incase there was last minute paperwork or anything. It was a pretty nondescript room in the hallway of a geosciences department (from what I could tell), the two other candidates scheduled in the same block as I was were already there waiting. The jury gets a break just before each session, so before we got started they all came out. They were very nice, an represented a broad cross-section in terms of disciplines.
I was the second to go in, I just had to show my drivers license (in case I had sent a ringer in my place), sign a paper, and then I was off. 12 minutes and a couple seconds later I was done. I answered my questions in English, but was able to answer questions asked in French (thankfully). And then, after 15 minutes. . . pouf! Done.
It was strangely anti-climatic. There were answers I wanted to explore more, but obviously, there was one other candidate waiting, and I’m sure the jury wanted to go eat dinner, or just sleep!
So that’s it. In a week or so we’ll find out. Had I applied for other sessions this all would have been more complicated. Section 52, the interdisciplinary (human, nature, climate) section doesn’t meet for another two weeks, so it makes coming for an interview much more difficult if you are not on the Continent. I had intended to apply for two sections next year, but I think I might keep it to a minimum, if I do apply again. It’s very difficult to take this much time away from family, friends and work, even if it is, effectively, work-related.
I am currently in Paris waiting to present a 12 minute talk that will help to decide whether I have a full time position with the French Centre National de la Recherche Scientifique, or the CNRS. I wanted to write about the experience briefly because it is quite unique, particularly with respect to the more common tenure-track experience that many of us go through.
The CNRS operates 10 distinct institutes, these include the INSB for biological sciences, and the INEE for ecological and environmental research. There are also institutes for chemistry, physics, computer sciences and earth and planetary sciences. Each of these institutes operates laboratories, or research stations within France – in fact, they operate over 1,100 research centers across the country.
The interesting thing about the CNRS is that these labs do not hire people individually. As far as I know, you wouldn’t see ads for full time researchers at the Laboratoire des Animaux Méchantes in Nimes (if such a lab existed). Instead, you apply to a particular section of the CNRS (one of 41) associated with your research background (and associated with one or more Institutes) in the hopes of getting a position in that section, and in your lab of choice. These are full time, tenured positions. Each year the CNRS posts the number of positions available to applicants at various career stages (DR1, DR2, CR1, and CR2, in order of seniority) within each section. You submit your package (which also includes a ~10 page summary of research to date, a bunch of biographical information, and some other stuff) through a central website, citing the appropriate sections, and then you wait to see whether you are allowed to continue to the interview stage.
I want to make it clear that it is extremely helpful to have someone within the CNRS system to help you through this process. There are instructions in French and in English that help walk you through the application procedure, but they do not discuss expectations. Had I not known that the research proposal was expected to be very detailed I would have simply submitted something similar to my standard research statement. The same goes for my statement of research to date. It helps to have an insider on your side!
In January of this year I submitted a twenty-something page mid-term research proposal, detailing a set of research goals, outlining the methodologies I would use and providing some of the background for these proposed ideas. This is your typical research statement on steroids. It includes statements on your own abilities, your potential network of expertise and the reasons you would like to work in the lab that you’re asking to be associated with. I applied within section 30, Continental Surfaces and Interfaces and I could probably also have applied to the interdisciplinary Section 42 (the sections numbers seem to change each year so be careful!). I asked to work with the Centre de Bio-Archéologie et d’Écologie, the CBAE, in Montpellier. A colleague of mine, Odile Peyron, is there now, and there is some excellent work being done in the lab, along with colleagues such as Walter Finsinger, Christelle Hély, and many others. I’ll post the proposal once I find out how I did 🙂
I submitted this package in early January. Out of approximately 120 people who submitted as CR2 applicants to Section 30 this year there about 40 selected to move on to the interview and I was one of them. Now, this is the tough part:
They don’t pay for you to come
It’s a 12 minute interview in front of a jury of about 15 senior researchers
Hardly anyone makes it their first time.
Okay. Well, for me it’s still a better than 10% chance, the CNRS position is a full-time, tenured research position, it’s with an excellent group of highly productive researchers at the CBAE, and it’s in the south of France, so my elementary school French Immersion program will finally pay off. But did I mention that the interview is only 12 minutes long? And it’s going to happen in less than 4 hours from now?
Granted, they’ll ask questions for about 15 minutes afterwards, but still. This is completely different from the successive phone interviews, reference letters and then sometimes multi-day interviews that people go through for tenure track jobs. But that multi-day process is replaced by the need to produce a high-quality research document prior to selection.
So, I’ll say again, hardly anyone makes it the first time (although people have), which means that by the time you do get hired (if you’re not culled) they’ll have seen you talk for nearly an hour over successive years. That’s long enough I suppose.
I’m not sure exactly what to expect. I already walked down to the room at the Pierre and Marie Curie University where the interview will be held, just to make sure I don’t get lost, I’ve corrected and re-corrected my talk. I’ve timed it (still a bit under time!) and I’m as ready as I’ll ever be, or at least, as ready-ish as I’ll ever be.
I’d love to hear from other people who have gone through this process though. It was a bit tough finding resources for foreign researchers about the CNRS process, so this could be a useful place to post tips, suggestions and comments. I’ll be happy to expand on my experience if anyone is interested.
I’ll leave you with this. Two of my favorite French (Canadian) songs, by the great Robert Charlebois and by the equally great Stereolab.
Dynamic Ecology had a post recently asking why there wasn’t an Ecology Blogosphere. One of the answers was simply that as ecologists we often recognize the depth of knowledge of our peers and as such, are unlikely (or are unwilling) to comment in an area that we have little expertise. This is an important point. I often feel like the longer I stay in academia the more I am surprised when I can explain a concept outside my (fairly broad) subject area clearly and concisely. It surprises me that I have depth of knowledge in a subject that I don’t directly study.
Of course, it makes sense. We are constantly exposed to ideas outside our disciplines in seminars, papers, on blogs & twitter, and in general discussions, but at the same time we are also exposed to people with years of intense disciplinary knowledge, who understand the subtleties and implications of their arguments. This is exciting and frightening. The more we know about a subject, the more we know what we don’t know. Plus, we’re trained to listen to other people. We ‘grew up’ academically under the guidance of others, who often had to correct us, so when we get corrected out of our disciplines we are often likely to defer, rather than fight.
This speaks to a broader issue though, and one that is addressed in the latest issue of Frontiers in Ecology and the Environment. The challenges of global change require us to come out of our disciplinary shells and to address challenges with a new approach, defined here as Macrosystems Ecology. At large spatial and temporal scales – the kinds of scales at which we experience life – ecosystems cease being disciplinary. Jim Heffernan and Pat Soranno, in the lead paper (Heffernan et al., 2014) detail three ecological systems that can’t be understood without cross-scale synthesis using multi-disciplinary teams.
The Amazonian rain forest is a perfect example of a region that is imperiled by global change, and can benefit from a Macrosystems approach. Climate change and anthropogenic land use drives vegetation change, but vegetation change also drives climate (and, ultimately, land use decisions). This is further compounded by teleconnections related to societal demand for agricultural products around the world and the regional political climate. To understand and address ecological problems in this region then, we need to understand cross-scale phenomena in ecology, climatology, physical geography, human geography, economics and political science.
Macrosystems proposes a cross-scale effort, linking disciplines through common questions to examine how systems operate at regional to continental scales, and at multiple temporal scales. These problems are necessarily complex, but by bringing together researchers in multiple disciplines we can begin to develop a more complete understanding of broad-scale ecological systems.
Interdisciplinary research is not something that many of us have trained for as ecologists (or biogeographers, or paleoecologists, or physical geographers. . . but that’s another post). It is a complex, inter-personal interaction that requires understanding of the cultural norms within other disciplines. Cheruvelil et al. (2014) do a great job of describing how to achieve and maintain high-functioning teams in large interdisciplinary projects, and Kendra also discusses this further in a post on her own academic blog.
In Goring et al. (2014) we discuss a peculiar issue that is posed by interdisciplinary research. The reward system in academia is largely structured to favor disciplinary research. We refer to this in our paper as a disciplinary silo. You are in a department of X, you publish in the Journal of X, you go to the International Congress of X and you submit grant requests to the X Program of your funding agency. All of these pathways are rewarded, and even though we often claim that teaching and broader outreach are important, they are important inasmuch as you need to not screw them up completely (a generalization, but one I’ve heard often enough).
As we move towards greater interdisciplinarity we begin to recognize that simply superimposing the traditional rewards structure onto interdisciplinary projects (Figure 2) leaves a lot to be desired. This is particularly critical for early-career researchers. We are asking these researchers (people like me) to collaborate broadly with researchers around the globe, to tackle complex issues in global change ecology, but, when it comes time to assess their research productivity we don’t account for the added burden that interdisciplinary research can require of a researcher.
Now, I admit, this is self-serving. As an early career researcher, and member of a large interdisciplinary team (PalEON), much of what we propose in Goring et al. (2014) strongly reflects on my own personal experience. Outreach activities, the complexities of dealing with multiple data sources, large multi-authored papers, posters and talks, and the coordination of researchers across disciplines are all realities for me, and for others in the project, but ultimately, we get evaluated on grants and papers. The interdisciplinary model of research requires effort that never gets valuated by hiring or tenure committees.
That’s not to say that hiring committees don’t consider this complexity, and I know they’re not just looking for Nature and Science papers, but at the same time, there is a new landscape for researchers out there, and we’re trying to evaluate them with an old map.
In Goring et al. (2014) we propose a broader set of metrics against which to evaluate members of large interdisciplinary teams (or small teams, there’s no reason to be picky). This list of new metrics (here) includes traditional metrics (numbers of papers, size of grants), but expands the value of co-authorship, recognizing that only one person is first in the authorship list, even if people make critical contributions; provides support for non-disciplinary outputs, like policy reports, dataset generation, non-disciplinary research products (white papers, books) and the creation of tools and teaching materials; and adds value to qualitative contributions, such as facilitation roles, helping people communicate or interact across disciplinary divides.
This was an exciting set of papers to be involved with, all arising from two meetings associated with the NSF Macrosystems Biology program (part of NSF BIO’s Emerging Frontiers program). I was lucky enough to attend both meetings, the first in Boulder CO, the second in Washington DC. As a post-doctoral researcher these are the kinds of meetings that are formative for early-career researchers, and clearly, I got a lot out of it. The Macrosystems Biology program is funding some very exciting programs, and this Frontiers issue attempts to get to the heart of the Macrosystems approach. It is the result of many hours and days of discussion, and many of the projects are already coming to fruition. It is an exciting time to be an early-career researcher, hopefully you agree!
As an academic you have the advantage of meeting people who do some really amazing research. You also have the advantage of doing really interesting stuff yourself, but you also tend to spend a lot of time thinking about very obscure things. Things that few other people are also thinking about, and those few people tend to be spread out across the globe. I had the opportunity to join researchers from around the world at Queen’s University in Belfast, Northern Ireland earlier this month for a meeting about age-depth models, a meeting about how we think about time, and how we use it in our research.
Time is something that paleoecologists tend to think about a lot. With the Neotoma paleoecological database time is a critical component. It is how we arrange all the paleoecological data. From the Neotoma Explorer you can search and plot out mammal fossils at any time in the recent (last 100,000 years or so) past, but what if our fundamental concept of time changes?
Most of the ages in Neotoma are relative. They are derived from from radiocarbon data, either directly, or within a chronology built from several radiocarbon dates (Margo Saher has a post about 14C dating here), which means that there is uncertainty around the ages that we assign to each pollen sample, mammoth bone or plant fossil. To actually get a radiocarbon date you first need to send a sample of organic material out to a lab (such as the Queen’s University, Belfast Radiocarbon Lab). The samples at the radiocarbon lab are processed and put in an Accelerator Mass Spectrometer (Figure 1) where molecules of Carbon reach speeds of millions of miles an hour, hurtling through a massive magnet, and are then counted, one at a time.
These counts are used to provide an estimate of age in radiocarbon years. We then use the IntCal curve to relate radiocarbon ages to calendar ages. This calibration curve relates absolutely dated material (such as tree rings) to their radiocarbon ages. We need the IntCal curve since the generation of radiocarbon (14C) in the atmosphere changes over time, so there isn’t a 1:1 relationship between radiocarbon ages and calendar ages. Radiocarbon (14C) 0 is actually 1950 (associated with atmospheric atomic bomb testing), and by the time you get back to 10,000 14C years ago, the calendar date is about 1,700 years ahead of the radiocarbon age (i.e., 10,000 14C years is equivalent to 11,700 calendar years before present).
To build a model of age and depth within a pollen core, we link radiocarbon dates to the IntCal curve (calibration) and then link each age estimate together, with their uncertainties, using specialized software such as OxCal, Clam or Bacon. This then allows us to examine changes in the paleoecological record through time, basically, this allows us to do paleoecology.
A case for updating chronologies
The challenge for a database like Neotoma is that the IntCal curve changes over time (IntCal98, IntCal04, IntCal09, and now IntCal13) and our idea of what makes an acceptable age model (and what constitutes acceptable material for dating) also changes.
If we’re serving up data to allow for broad scale synthesis work, which age models do we provide? If we provide the original published model only then these models can cause significant problems for researchers working today. As I mentioned before, by the time we get back 10,000 14C years the old models (built using only 14C ages, not calibrated ages) will be out of sync with newer data in the database, and our ability to discern patterns in the early-Holocene will be affected. Indeed, identical cores built using different age models and different versions of the IntCal curve could tell us very different things about the timing of species expansions following glaciation, or changes in climate during the mid-Holocene due to shifts in the Intertropical Convergence Zone (for example).
So, if we’re going to archive these published records then we ought to keep the original age models, they’re what’s published after all, and we want to keep them as a snapshot (reproducible science and all that). However, if we’re going to provide this data to researchers around the world, and across disciplines, for novel research purposes then we need to provide support for synthesis work. This support requires updating the calibration curves, and potentially, the age-depth models.
So we get (finally) to the point of the meeting. How do we update age-models in a reliable and reproducible manner? Interestingly, while the meeting didn’t provide a solution, we’re much closer to an endpoint. Scripted age-depth modelling software like Clam and Bacon make the task easier, since they provide the ability to numerically reproduce output directly in R. The continued development of the Neotoma API also helps facilitate this task since it again would allow us to pull data directly from the database, and reproduce age-model construction using a common set of data.
One thing that we have identified however are the current limitations to this task. Quite simply, there’s no point in updating some age-depth models. The lack of reliable dates (or of any dates) means that new models will be effectively useless. The lack of metadata in published material is also a critical concern. While some journals maintain standards for the publication of 14C dates they are only enforced when editors or reviewers are aware of them, and are difficult to enforce post publication.
The issue of making data open and available continues to be an exciting opportunity, but it really does reveal the importance of disciplinary knowledge when exploiting data sources. Simply put, at this point if you’re going to use a large disciplinary database, unless you find someone who knows the data well, you need to hope that signal is not lost in the noise (and that the signal you find is not an artifact of some other process!).