downwithtime

A Hiatus of Sorts

There was a time when I aspired to post weekly, and then bi-weekly, maybe once a month, and now. . . very little.

That’s not to say I haven’t been busy. I’ve moved a lot of content over to a research website at http://goring.org, with a nice picture of the Wicked Witch of the East (that ought to be properly credited, I’ll fix that. . . ).

I’ve also moved my social advocacy stuff to either Twitter, or institutionally within EarthCube, where I’m now a member of the Leadership Council & head of the Engagement Committee.

It all comes down to time. I find that it takes me a lot of time to write a blog post, and my time seems to be stretched thinner than it was before. Something needed to give, and, in the end it was either my sanity & work-life balance, or this blog (and a couple other things).

I stand by all my old posts, I think they’re all equally fantastic, but please check out my website for more current information on what I’m up to. Thanks for stopping by.

Who is a Scientist – Reflections on #AAG2016

This is the first time I’ve really been to the American Association of Geographers meeting. Last year it was held in Chicago, which is really close to Madison, and I was invited to speak at a session called “The View from the Anthropocene” organized by two great Geographers from the University of Connecticut, Kate Johnson and Megan Hill, but I had the kids & really only spent the morning there. I’m pretty sure there was another reason as well, but I can’t remember what it was.

It was great to see Kate & Megan again this year, both of them are doing really cool stuff (check out their web pages), and it was really great to see that the momentum behind the original idea was enough to re-tool the session into the Symposium on Physical Geography at this year’s AAG meeting in San Francisco, with Anne Chin of UC-Denver on the organizing committee, and a host of fantastic speakers.

My own presentation in the session focused on the Anthropocene, and its role as both a boundary (whether you want to define it as an Epoch sensu stricto or as a philosophical concept – I think that Stanley Finney & Lucy Edward’s article in GSA Today nicely lays out the arguments) and a lens. The second part of that equation (the lens) is a more diffuse point, but my argument is that the major changes we see in the earth system can impact our ability to build models of the past using modern analogues, whether those be climatic, or biological. I show this using pollen and vegetation records from the Midwest, and make the connection to future projections with the example laid out in Matthes et al. (2015), where we show that the pre-industrial climate niche of plant functional types used in GCMs as part of the CMIP5 intercomparison are not better than random when compared to actual “pre-settlement” vegetation in the northeastern United States.

But I really want to talk about a single slide in my talk. In the early part of my talk I use this slide:

MGDavisSlide

This is Margaret Davis, one of the most important paleoecologists in North America, past-president of the ESA [PDF], and, importantly, a scientists who thought deeply about our past, our present and our future. There’s no doubt she should be on the slide. She is a critical piece of our cultural heritage as scientists, an because of her research, is uniquely well suited to show up in a slide focusing on the Anthropocene.

But it’s political too. I put Margaret Davis up there because she’s an important scientist, but I also chose her because she’s an important female scientist. People specifically commented on the fact that I chose a female scientist, because it’s political. It shouldn’t be. There should be no need for me to pick someone because of their gender, and there should be no reason to comment on the fact that she was a female scientist. It should just “be”.

Personal actions should be the manifestation of one’s political beliefs, but so much of our day to day life passes by without contemplation. Susanne Moser, later in my session, talked about the psychological change necessary to bring society around to the task of reducing CO2, of turning around the Anthropocene, or surviving it, and I think that the un-examined life is a critical part of the problem. If we fail to take account of how our choices affect others, or affect society then we are going to confront an ugly future.

Everything is a choice, and the choices we make should reflect the world we want for ourselves and for the next generations. If our choices go un-examined then we wind up with the status quo. We wind up with unbalanced panels, continued declines in under-represented minority participation in the physical sciences, and an erosion of our public institutions.

This post is maybe self-serving, but it shouldn’t have to be. We shouldn’t have to look to people like DN Lee, the authors of Tenure She Wrote, Chanda Prescod-Weinstein, Terry McGlynn, Margaret Kosmala, Oliver Keyes, Jacquelyn Gill and so many others who advocate for change within the academic system, often penalizing themselves in the process. We should be able to look to ourselves.

Okay, enough soap-boxing. Change yourselves.

Semantics Shememantics

In science we work, more often than not, in teams. Whether we work with one other individual, five individuals, or interact at workshops with hundreds of strangers, it’s important that we are clearly understood. Clarity is critical, especially when explaining complex concepts. KISS is my second favorite acronym, even if I can’t keep to the principle (NEFLIS, a camping acronym, is my favorite – No Excuse for Living in Squalor just because you’re out in the woods).

A recently funded project I’m working on, under the aegis of EarthCube, is the harmonization of the Neotoma Paleoecological Database and the Paleobiology Database. Neotoma is a database of Quaternary fossils (mammals and microfossils such as pollen and ostracodes), and the Paleobiology Database is a database of every other kind of fossil. Both are incredible repositories for their respective communities, and powerful research tools in their own right. My Distinguished Lecture talk at the University of Wisconsin’s Rebecca J. Holz Research Data Symposium was about the role of Community Databases in connecting researchers to Big Data tools, while getting their data into a curated form so that others could easily access and transform their data to undertake innovative research projects.

Superman Card Game by Whitman (1978) - G by andertoons, on Flickr — Figure 1. Semantic differences can be kryptonite for a project. Especially a project that has very short arms relative to the rest of its body like Superman does in this picture. [ credit: andertoons ]

Our recent joint Neotoma-PBDB workshop, in Madison WI, showed me that, even with such closely allied data and disciplinary backgrounds, semantics matter. We spent the first morning of the meeting having a back and forth discussion, where it kept seeming like we agreed on core concepts, but then, as the conversations progressed, we’d fall back into some sort of confusion. As it began to seem unproductive we stepped back and checked in to see if we really were agreeing on core concepts.

While both databases contain fossil data, there is a fundamental difference in how the data are collected. Typically Paleobiology DB data is collected in isolation, a fossil whale, discovered & reported is more common than a vertical stratigraphic survey on a single outcrop at a specific Latitude and Longitude. In Neotoma, so much of our data comes from lake sediment cores that it makes sense that much of our data (and data collection) is described from stratigraphic sequences.

This difference may not seem like much, especially when the Holocene (the last 11,700 years) is basically an error term in much of the Paleobiology Database, but it’s enough of a difference that we were apparently agreeing on much, but then, inexplicably, disagreeing on followup discussions.

This is one of the fundamental problems in interdisciplinary research. Interdisciplinarity is as much understanding the terms another discipline uses as it is understanding the underlying philosophy and application of those terms in scientific discourse. Learning to navigate these differences is time consuming, and requires a skill set that many of us don’t have. At the very least, recognizing this is a problem, and learning to address this issue is a skill that is difficult to master. It took us a morning of somewhat productive discussion before we really looked at the problem. Once addressed we kept going back to our draft semantic outline to make sure we knew what we were talking about when discussing each other’s data.

This is all to say, we had a great workshop and I’m really looking forward to the developments on the horizon. The PBDB development team is great (and so is the Neotoma team) and I think it’s going to be a very fruitful collaboration.

See you at #AGU2015

I’m heading to AGU early this year, part of the Neotoma Annual Meeting at Berkeley. We’ve recently been awarded an NSF EarthCube Integrated Activities award to harmonize Neotoma and the Paleobiology Database (and other allied paleobiological archives), but we’ve also made some big gains in working with allied Plio-Pleistocene databases and researchers across the globe in adding to Neotoma’s already considerable data holdings.

I’m looking forward to the upcoming Neotoma meeting. One very exciting development is our partnership with the University of Wisconsin’s Library System. We’ve been working toward providing data contributors with persistent Digital Object Identifiers (DOIs) for their contributions. Our work with the UW Library System has seen a new contract established between the UW Libraries and DataCite, which gives us access not only to DOI minting, but also new connections to a set of established metadata standards and a robust API for minting DOIs, editing metadata and searching for contributions.

My own poster [GC11E-1065] goes up on Monday Morning (so get registered early!) as part of the Dating the Anthropocene session. There was a great paper in Eos in the last issue (What is the Anthropocene? by L. Edwards) that lays out some options for ways in which geoscientists might treat the Anthropocene. My work in the Upper Midwestern United States leads me to believe that there is a clear and persistent signal of human agency on the landscape, but it’s time transgressive, and it varies. It was nice to see Edwards point to some of the pioneering work by the great Canadian palynologist Jock McAndrews, but the signal of EuroAmerican settlement is so broad it clearly represents a state change (see our pre-print here, now in review), as opposed to the signal of earlier human land use, nicely reviewed in Munoz et al (2014). If you want to talk about it, come find me on Monday!

Jack Williams will be presenting some of the work we’ve been doing with Neotoma with his poster on Tuesday afternoon in the poster session [IN23B-1731] for Facilitating Open Science and Data through Scientist Engagement and Evolving Technology.

WilliamsPoster

Alan Ashworth is also talking about Neotoma in another presentation on Tuesday from 9:15 – 9:30 in the Agile Curation, Data Access and Infrastructure, and Data Layers session. This, incidentally, is a session that I am convening, along with Denise Hills of the Alabama Geological Survey and Marjorie Chan of the University of Utah, the two chairs of the EarthCube Engagement Team. I am also convening the Facilitating Open Science through Engagement and Evolving Technology poster session on Tuesday afternoon. Some great opportunities to find out about both EarthCube sponsored open data initiatives & the broad range of data and science platforms that are being developed.

So, once again, a very busy AGU for me, and I haven’t even mentioned Kevin Burke’s great poster on Monday morning. Having finished his M.Sc Kevin is on to more great work modeling the influence of changing wind fields on our ability to reconstruct past vegetation from pollen data.

Hope to see you at AGU, I’ll be there from Friday to Wednesday, so if you’d like to get in touch, let me know!

The interdisciplinary study of organic walled microfossils: A ramble.

Figure 1. That's really a lot of pollen. A lot of pollen. Image by Brooke Novak — Figure 1. That’s really a lot of pollen. A lot of pollen. Image by Brooke Novak

It’s no secret to members of the Canadian Association of Palynologists (join now!) that the study of organic-walled microfossils is the most interesting branch of science, but it may come as a surprise to some of our colleagues. The thing is, our colleagues all have their own opinions. If they’re in biology departments they probably like bears; geology, they probably like different kinds of gravel; geography, obviously they like the names of rivers and knowing where towns and cities are. The reality of being a palynologist is that you’re often working in a department that specializes in something that isn’t palynology. From time to time this can be a curse, but it’s also a very exciting opportunity.

This year I had the pleasure of attending both the Ecological Society of America meeting and the Geological Society Meeting, both of which were held in Baltimore, Maryland (39.2833° N, 76.6167°W for the Geographers). At both meetings I co-chaired a session titled “Paleoecological patterns, ecological processes, modeled scenarios: crossing temporal scales to understand an uncertain future”. The sessions highlighted the applications of paleoecology to understanding the processes of ecological and geophysical change across decadal to millennial time scales.

It really is a testament to palynologists (and paleoecologists more broadly) that neither session felt out of place in either ESA or GSA. The nature of the problems we address through our research rely on the integration of ecological knowledge and geophysical process. Both sessions had impressive contributions from early-career researchers and established researchers, and both sessions pointed to new and unexplored avenues of research. Both sessions also showcased a bit of the flavor of the meetings themselves. The ESA talks focused more on ecological processes, the accumulation of carbon in ecosystems, forest cover change and regional dynamics, and change within ecological systems. At the GSA meeting there was a much heavier imprint of climate and deeper time scales.

In recent years paleoecology has become more visible to ecologists as they have begun to tackle the complex problems of predicting community change under various climate change scenarios. At the same time, questions of carbon dynamics, vegetation-atmosphere feedbacks, and other large scale questions of relevance to geoscientists have increasingly drawn from the knowledge of paleoecologists and palynologists. Of course, there is a long tradition of paleoecologists contributing significantly to interdisciplinary sciences. Palynologists have been using their unique view of the earth system over long time scales to help frame our understanding of the Earth’s past as far back as von Post (see Conway’s overview of von Post’s work in the New Phytologist).

Figure 2. Group photo from the First International Conference on Palynology. Palynologists have made great strides in improving gender parity since this time, but Margaret Davis is visible front center. [link from PALYNOS]

Palynology is great precisely because the people studying it continue to pursue innovative and exciting research that borrows strongly from our history as a deeply interdisciplinary discipline. It is this interdisciplinary history that allows us to present our work to Foresters, Ecologists, Geologists, Climatologists, or Oceanographers. We have to be a little bit of all of these in order to make sense of the microscopic organic-walled microfossils that we see dancing under the microscope. [note: if they really are dancing you should cut down on the silicone oil]

Helping to fill the cloud from the bottom up.

Open data in the sciences is an aspirational goal, and one that I wholeheartedly agree with. The efforts of EarthCube (among others) to build an infrastructure of tools to help facilitate data curation and discovery in the Earth Sciences have been fundamental in moving this discussion forward in the geosciences, and at the most recent ESA meeting saw the development of a new section of the society dedicated to Open Science.

One of the big challenges to open science is that making data easily accessible and easily discoverable can be at odds with one another. Making data “open” is as easy as posting it on a website, but making it discoverable is much more complex. Borgman and colleagues (2007) very clearly lay out a critical barrier to data sharing in an excellent paper examining practices in “habitat ecology” (emphasis mine):

Also paradoxical is researchers’ awareness of extant metadata standards for reconciling, managing, and sharing their data, but their lack of use of such standards. The present dilemma is that few of the participating scientists see a need or ability to use others’ data, so they do not request data, they have no need to share their own data, they have no need for data standards, and no standardized data are available. . .

The issue, as laid out here, is that people know that metadata standards exist, but they’re not using them from the ground up because they’re not using other people’s data. Granted this paper is now eight years old, but, for the vast majority of disciplinary researchers in the geosciences and biological sciences the extent of data re-use is most likely limited to using data tables from publications, if that. [a quick plug, if you’re in the geosciences, please take a few minutes to complete this survey on data sharing and infrastructure as part of the EarthCube Initiative]

So, for many people who are working with self-styled data formats, and metadata that is largely implicit (they’re the only ones who really understand their multi-sheet excel file), getting data into a new format (one that conforms to explicit metadata standards) can be formidable, especially if they’re dealing with a large number of data products coming out of their research.

Right now, for just a single paper I have ten different data products that need to have annotated metadata. I’m fully aware that I need to do it, I know it’s important, but I’ve also got papers to review and write, analysis to finish, job applications to write, emails to send, etc., etc., etc., and while I understand that I can now get DOIs for my data products, it’s still not clear to me that it really means anything concrete in terms of advancement.

Don’t get me wrong, I am totally for open science, all my research is on GitHub, even partial papers, and I’m on board with data sharing. My point here is that even for people who are interested in open science, correctly annotating data is still a barrier.

How do we address this problem? We have lots of tools that can help generate metadata, but many, if not all, of these are post hoc tools. We talk extensively, if colloquially, about the need to start metadata creation at the same time as we collect the data, but we don’t incentivise this process. The only time people realize that metdata is important is at the end of their project, and by then they’ve got a new job to start, a new project to undertake, or they’ve left academia.

Making metadata creation a part of the research workflow is something I am working toward as part of the Neotoma project. Where metadata is a necessary component of the actual data analysis. The Neotoma Paleoecological Database is a community curated database that contains sixteen different paleoecological proxies, ranging from water chemistry to pollen to diatoms to stable isotope data (see Pilaar Birch and Graham 2015). Neotoma has been used to study everything from modern patterns of plant diversity, rates of migration for plant and mammals, rates of change in community turnover through time, and species relationships to climate. It acts as both a data repository and a research tool in and of itself. A quick plug as well, the completion of a workshop this past week with the Science Education Resource Center at Carleton College in Minnesota has resulted in the development of teaching tools to help bring paleoecology into the classroom (more are on their way).

Neotoma has a database structure that includes a large amount of metadata. Due in no small part to the activities of Eric Grimm, the metadata is highly curated, and, Tilia, a GUI tool for producing stratigraphic diagrams and age models from paleoecological data, is designed to store data in a format that is largely aligned with the Neotoma database structure.

In designing the neotoma package for R I’ve largely focused on its use as a tool to get data out of Neotoma, but the devastating closure of the Illinois State Museum by the Illinois Governor (link) has hastened the devolution of data curation for the database. The expansion of the database to include a greater number of paleoecological proxies has meant that a number of researchers have already become data stewards, checking to ensure completeness and accuracy before data is uploaded into the database.

Having the R package (and the Tilia GUI) act as a tool to get data in as well as out serves an important function, it acts as a step to enhance the benefits of proper data curation immediately after (or even during) data generation because the data structures in the applications are so closely aligned with the actual database structure.

We are improving this data/metadata synergy in two ways:

Data structures: The data structures within the R package (detailed in our Open Quaternary paper) remain parallel to the database. We’re also working with Mark Uhen, Shanan Peters and others at the Paleobiology Database (as part of this funded NSF EarthCube project) and, elsewhere, for example, the LiPD Project, which is itself proposing community data standards for paleoclimatic data (McKay and Emile-Geay, 2015).
Workflow: Making paleoecological analysis easier through the use of the R package has the benefit of reducing the ultimate barrier to data upload. This work is ongoing, but the goal is to ensure that by creating data objects in neotoma, data is already formatted correctly for upload to Neotoma, reducing the burden on Data Stewards and on the data generators themselves.

This is a community led initiative, although development is centralized (but open, anyone can contribute to the R package for example), the user base of Neotoma is broad, it contains data from over 6500 researchers, and data is contributed at a rate that continues to increase. By working directly with the data generators we can help build a direct pipeline into “big data” tools for researchers that have traditionally been somewhere out on the long tail.

Jack Williams will be talking a bit more about our activities in this Middle Tail, and why it’s critical for the development of truly integrated cyberinfrastructure in the geosciences (the lessons are applicable to ecoinformatics as well) at GSA this year (I’ll be there too, so come by and check out our session: Paleoecological Patterns, Ecological Processes and Modeled Scenarios, where we’re putting the ecology back in ge(c)ology!).

People, climate, fire. The Future meets the Past, and decides it wants to do its own thing.

I’ve been very lucky to work with great co-authors over the past few years, and this year is no exception. Along with a raft of papers we are about to submit I just got notified that a paper we submitted a few months ago is now online in the Annals of the Association of American Geographers (journal title chosen in the pre-Twitter age, obv.).

This paper, with Megan Walsh, Jenn Marlon, Dan Gavin and Kendrick Brown (and also me!) is a great look at fire records from the Pacific Northwest (PNW) over the last 5000 years. This time period is particularly critical to understanding human-climate-fire relationships. Human populations over the last 5000 years were increasing in the region, and climate was shifting, gradually cooling and becoming more moist following the early Holocene xerothermic period.

Figure 1. Wildfires burn in the Pacific Northwest, even with its reputation as a wet region. In the past, as today, humans likely played a role, along with climate. Image from Wikimedia Commons.

The challenge is that the PNW is an incredibly heterogeneous region. You can hike nine kilometers and shift from dry valley bottom to an alpine peak. Most of the climatic gradients are more homogeneous on the NS axis than they are on the EW axis. The more critical problem is that, in a more general sense, human activity, on a landscape scale, is very difficult to detect or attribute prior to widespread EuroAmerican colonization (see Sam Munoz‘s excellent paper here).

Our paper comes out at a particularly important time. We’ve seen an incredible fire season in the Pacific Northwest this year, driven in part by very dry conditions, both during the summer, but, more importantly by low snow packs in the winter. Knowing that past fire regimes in the region increased, even as temperatures cooled through the late Holocene, has serious implications for the future. Biomass stocks in the PNW remain high (even following widespread logging), and the open fire-dominated forests that were adapted to warmer drier conditions of the early Holocene (mostly Douglas fir-dominated) are no longer established in the region.

We may be burning ourselves to a new ecological baseline.

Amy Hessl, who was a co-convener on our fantastic ESA session this year (link), has a nice paper in Ecological Applications from 2004 linking the Pacific Decadal Oscillation to fire activity in the PNW interior (here) based on a set of fire scar data, so changing the intensity and frequency of these climatic systems is certainly going to shift our frequency and intensity. Wimberly and Liu (2014) support the idea that management focused on reducing biomass on the landscape (prescribed burning and thinning) will help counter increasing fire severity and frequency, suggesting that management may be the key in transitioning to a warmer, drier future, and the key to understanding what these future forests might look like.

Fire in the early Holocene resulted in forests on the north shore of Vancouver with much higher proportions of Douglas fir pollen than are found in many modern day sites (see Marion Lake on the Neotoma Explorer – here, and check out the “Diagram” tab). Douglas fir is not a heavy pollen producer, and yet it reached almost 20% of the pollen sum, along with higher proportions of Alder and Bracken fern, a fire-adapted fern. Currently these taxa are found in low proportions throughout the PNW, except in regions with very low rainfall, and historically low fire return intervals, and proportions of Douglas fir over 20% are almost entirely restricted to southern Vancouver Island in British Columbia, although there may be higher proportions in the US.

Are we heading to a new, old baseline?

Figure 2. You don't like prescribed burns? Why not? Image from Wikimedia commons. — Figure 2. You don’t like prescribed burns? Why not? Image from Wikimedia Commons.

It’s unlikely that we’ll see fire return intervals as high as we’ve seen in the past. Active fire management will certainly keep fire activity lower than in the Holocene record because we put so much effort into countering large-scale fire. The interesting thing to me is the idea that we’ll be managing these landscapes for fire, so we’ll have aspects of forest structure that map onto historical forests well: more open canopies, lower biomass, fire tolerant species, but, because of political volatility of prescribed burning, we are likely to see some fire tolerant species absent from the landscape, particularly understory species that we are unlikely to manage. This might lead to novel species assemblages, with fire tolerant canopy species, and less tolerant understory species. The result of a “fire adapted” landscape that has arisen as the result of active management in the absence of fire. Planting, thinning, and continued management, without the presence of fire.

This maps well to what we’ve seen in the Georgia Basin, the encroachment of Scotch Broom into what has historically been Garry Oak Savanna. We have open canopies, a shrub layer of highly flammable, introduced and invasive species, but still the structural attributes of a savanna landscape, minus grasses, so, uh, well, not exactly savanna. But that’s fine, because I said there was no real analogue, so QED. This post is too long anyway 🙂

ESA 2015 – On the way to a new century!

I’m involved in a Plenary Workshop this year, organized by some great folks at UNC-Chapel Hill. I’m privileged to have been asked by these students, al of whom are currently Ph.D candidates. They’ve taken a great idea and turned it into something that will be an excellent Plenary Session, with some (hopefully) long lasting impact. Given the subject (the future of interdisciplinary ecology) it’s also perfectly well suited to the centennial ESA meeting. They’ve just posted this to ECOLOG so I wanted to share it here, since many of my readers are likely involved in interdisciplinary research themselves.

Dear members and friends of the Ecological Society of America (ESA): This survey is relevant to all ecologists, especially those engaged in interdisciplinary research. In celebration of the Centennial of ESA, a team of doctoral students at UNC Chapel Hill are conducting a study to assess the state of interdisciplinary research and scholarship inside and outside of the academy (IRB #15-0821). The results of this study will be shared at an upcoming workshop convened as part of the 100th Meeting of the Ecological Society of America. Results are intended to help workshop organizers identify the challenges and rewards that interdisciplinary ecologists encounter. Click here for more information about this ESA Plenary Workshop and how you can still register (Aug 8th @ 306 Baltimore Convention Center). We welcome participation from ecological researchers at all career levels.

This online survey will take 15 minutes to complete. The survey link will remain active until July 15, 2015. Your participation is completely voluntary and confidential. Keep in mind that no compensation is provided. Your confidential feedback will be used for a peer-reviewed publication and shared widely with the global community of ecologists. Research methods are in full compliance with IRB policies regarding confidentiality and research ethics of the University of North Carolina at Chapel Hill. Please contact Principal Investigator Clare Fieseler for further questions or comments about the survey (link to Ecolog post with contact information)

ANONYNMOUS LINK TO SURVEY

Clare Fieseler PhD Candidate & Principal Investigator
Sierra Woodruff CEE PhD Candidate & Co-Investigator
Dennis Tarasi PhD Candidate & Co-Investigator

Explorations in outreach – Creating a Twitter bot for the Neotoma Paleoecological Database.

If you’ve ever been in doubt about whether you chose the right programming language to learn I want to lay those concerns to rest here.

For many scientists, particularly in Biology or the Earth Sciences, there is often a question about whether you should be learning R, Python, Matlab or something else. Especially when you’re coming into scientific programming in grad school with little prior experience this might seem like a daunting proposal. You already don’t know anything about anything, and ultimately you wind up learning whatever you’re taught, or whatever your advisor is using and you wonder. . . Is the grass greener over in Python-land? Those figures look nice, if only I had learned R. . . Why did I learn on an expensive closed platform?

I am here to say “Don’t worry about it”, and I want to emphasize that with an example centered around academic outreach:

The Neotoma Paleoecological Database has had an issue for several years now. We have had a large number of datasets submitted, but very few people could actively upload datasets to the database. Neotoma is a live database, which means that not only do new datasets get added, but, as new information becomes available (for example, new taxonomic designations for certain species) datasets get updated. This means that maintaining the database is very time intensive and there has traditionally been a gap between data ingest and data publication. To make up for this there has been a data “Holding Tank” where individual records have been available, but this wasn’t the best solution.

Fast forward to about a year ago. Eric Grimm at the Illinois State Museum update the software package Tilia to provide greater access to the database to selected data stewards. Each data type (including insets, pollen, mammal fossils, XRF, ostracodes, lake chemistry) has one or a few stewards who can vet and upload datasets directly to the database using the Tilia platform. This has increased the speed at which datasets have entered Netoma rapidly — over the last month there have been more than 200 new datasets entered — but it’s still hard to get a sense of this as an outsider since people don’t regularly check the database unless they need data from it.

Which brings us to Twitter. Academics have taken to Twitter like academics on a grant . Buzzfeed has posted a list of 25 twitter feeds for nerds, Science published a somewhat contentious list of scientists to follow, and I’m on twitter, so obviously all the cool kids are there. This led me to think that twitter could be a good platform for publicizing new data uploads to Neotoma. Now I just needed to learn how.

The process is fairly straightforward:

Figure out what the most recently posted Neotoma datasets are:
- This is made easier with the Neotoma API, which has a specific method for returning datasets: http://ceiwin10.cei.psu.edu/NDB/RecentUploads?months=1
- You’ll notice (if you click) that the link returns data in a weird format. This format is called JSON and it has been seen by many as the successor to XML (see here for more details).
Check it against two files, (1) a file of everything that’s been tweeted already, and (2) a file with everything that needs to be tweeted (since we’re not going to tweet everything at once)
Append the new records to the queue of sites to tweet.
Tweet.

So that’s it (generally). I’ve been working in R for a while now, so I have a general sense of how these things might happen. The thing is, these same mechanics translate to other languages as well. The hardest thing about programming (in my opinion) is figuring out how the program ought to flow. Everything else is just window dressing. Once you get more established with a programming language you’ll learn the subtleties of the language, but for hack-y programming, you should be able to get the hang of it regardless of your language background.

As evidence, Neotomabot. The code’s all there, I spent a day figuring out how to program it in Python. But to help myself out I planned it all first using long-hand notes, and then hacked it out using Google, StackOverflow and the Python manual. Regardless, it’s the flow control that’s key. With my experience in R I’ve learned how “for” loops work, I know about “while” loops, I know try-catch methods exist and I know I need to read JSON files and push out to Twitter. Given that, I can map out a program and then write the code, and that gives us Neotomabot:

Neotoma welcomes another North American Pollen Database dataset: Nutella Lake from M. Rohr http://t.co/tv5HbxwEfi

— Neotoma Database (@neotomadb) May 5, 2015

All the code is available on the GitHub repository here, except for the OAuth handles, but you can learn more about that aspect from this tutorial: How to Write a Twitter Bot. I found it very useful for getting started. There is also a twittR, for R, there are several good tutorials for the package available (here, and here).

So that’s it. You don’t need to worry about picking the wrong language. Learning the basics of any language, and how to map out the solution to a problem is the key. Focus on these and you should be able to shift when needed.

The long tail of under-representation

I am by no means an expert on the subject of under-representation in the sciences. There are some excellent academic bloggers who have done some great work in discussing issues around race and gender in academia (including DNLee, the tweeters at and using #BlackandSTEM – EDIT: this is an amazing post by Dr. Chanda Prescod-Weinstein over at medium.com). This post is intended to highlight what I’ve observed and experienced over the past year or so, with a specific observation surrounding the EarthCube Early Career Travel Grant.

The issue of diversity is tricky in academia, because diversity means different things to different people. In a phone interview I once asked what sort of supports the department had to increase diversity and was told that they had a number of women on faculty. Period. Um . . . I guess that’s one aspect of diversity, but if that’s the end of it, then there are some problems that need to be addressed.

The National Sciences Foundation has specifically addressed diversity in the Geosciences with a number of initiatives, for example the “Opportunities for Enhancing Diversity in the Geosciences (OEDG) Program” which (as far as I know) lapsed in 2012, and the subsequent report “Strategic Framework for Education and Diversity, Facilities, International Activities, and Data and Informatics in the Geosciences” (PDF). Among disciplines supported through the NSF the geosciences is one of the least diverse, and this problem needs to be addressed.

As part of the EarthCube Engagement Team I drafted the EarthCube Early Researcher Travel Grant along with my Team colleagues, who are all fantastic and engaging people and great scientists (you should join our team, check out the link). In providing a larger pool of money for EC researchers from under-represented backgrounds we are recognizing that the causes of underrepresentation are not always overt, but are often structural, related to a lack of funding for travel (in this case), and, correlated to that, a lack of visibility for the research products that these researchers might be developing.

This brings up the secondary issue: Who is part of an under-represented group in the geosciences?

It should be clear that the NSF Geo does value diversity, and has explicitly laid out the groups it considers underrepresented:

“women, minorities (African-Americans, Hispanics, Native Americans, Alaska Natives, Native Hawaiians and other Pacific Islanders), and persons with disabilities”

But this is not an exhaustive list. If you’ve read Tenure She Wrote with any regularity you’ll have come across some of the posts by the author dualitea (see here, and here for some examples). For me many of these posts have been eye-opening, and indicate the much broader set of individuals that are not well represented in the sciences, let alone the geosciences. Along with trans academics we can include the broader LGBT community, potentially invisible, but also underrepresented. In addition acclimatrix and sarcozona have posted about economic background and academia (respectively here and here). There are obviously many groups of under-represented scholars that lie outside the defined bounds. Even thinking about Terry McGlynn’s work on Small Pond Science (e.g., here and here), we could potentially include students from smaller universities. Limited by lack of funding and opportunity, and in need of greater, and more flexible support from non-traditional sources (such as the EarthCube funding).

So, I bring this all up in an effort to answer a question we received about the Early Career Travel Grants: “What is an under-represented group”?

In our proposal we specifically phrased this portion of the text to read:

Individuals that self-identify as part of an underrepresented group within the geosciences may apply for a $1000 travel grant.

In response to the question as posed we have added further text to the grant:

Applicants are asked to self-report if they constitute as an underrepresented minority. This is to allow for a greater breadth in determining what constitutes diversity within the geosciences, but also to allow applicants to empower themselves by clarifying how this proposal and funding will help further the goals of a broad and diverse geosciences discipline.

I think this does two things. It leaves self-definition open, and provides an opportunity for people to use their own voice to define the challenges they face in a non-judgemental way. We’ve pointed to resources individuals can use to help define what it means to be under-represented in the geosciences, and, hopefully, provided a way to develop a voice around their place in academia and the geosciences.

Does this approach open the possibility that the grant will be abused by people who try to justify ‘under-representation’ that doesn’t exist: “White left-handed males” (my particular group)? Maybe, but I’d rather see people get the money to break down barriers, than generate another barrier to access for people who really need it.

This is a long post, but I thought that it might be interesting for people, and I’m genuinely curious how people feel about this, so please chime in in the comments.

If you know anyone who might be interested in this grant, please forward the link on to them, and if you know of any networks we can use to advertise this and future funding opportunities more widely please let us know!