Explorations in outreach – Creating a Twitter bot for the Neotoma Paleoecological Database.

If you’ve ever been in doubt about whether you chose the right programming language to learn I want to lay those concerns to rest here.

For many scientists, particularly in Biology or the Earth Sciences, there is often a question about whether you should be learning R, Python, Matlab or something else.  Especially when you’re coming into scientific programming in grad school with little prior experience this might seem like a daunting proposal.  You already don’t know anything about anything, and ultimately you wind up learning whatever you’re taught, or whatever your advisor is using and you wonder. . . Is the grass greener over in Python-land? Those figures look nice, if only I had learned R. . . Why did I learn on an expensive closed platform?

I am here to say “Don’t worry about it”, and I want to emphasize that with an example centered around academic outreach:

The Neotoma Paleoecological Database has had an issue for several years now.  We have had a large number of datasets submitted, but very few people could actively upload datasets to the database.  Neotoma is a live database, which means that not only do new datasets get added, but, as new information becomes available (for example, new taxonomic designations for certain species) datasets get updated.  This means that maintaining the database is very time intensive and there has traditionally been a gap between data ingest and data publication.  To make up for this there has been a data “Holding Tank” where individual records have been available, but this wasn’t the best solution.

Fast forward to about a year ago. Eric Grimm at the Illinois State Museum update the software package Tilia to provide greater access to the database to selected data stewards.  Each data type (including insets, pollen, mammal fossils, XRF, ostracodes, lake chemistry) has one or a few stewards who can vet and upload datasets directly to the database using the Tilia platform. This has increased the speed at which datasets have entered Netoma rapidly — over the last month there have been more than 200 new datasets entered — but it’s still hard to get a sense of this as an outsider since people don’t regularly check the database unless they need data from it.

Which brings us to Twitter. Academics have taken to Twitter like academics on a grant .  Buzzfeed has posted a list of 25 twitter feeds for nerds, Science published a somewhat contentious list of scientists to follow, and I’m on twitter, so obviously all the cool kids are there. This led me to think that twitter could be a good platform for publicizing new data uploads to Neotoma.  Now I just needed to learn how.

The process is fairly straightforward:

  1. Figure out what the most recently posted Neotoma datasets are:
    • This is made easier with the Neotoma API, which has a specific method for returning datasets: http://ceiwin10.cei.psu.edu/NDB/RecentUploads?months=1
    • You’ll notice (if you click) that the link returns data in a weird format.  This format is called JSON and it has been seen by many as the successor to XML (see here for more details).
  2. Check it against two files, (1) a file of everything that’s been tweeted already, and (2) a file with everything that needs to be tweeted (since we’re not going to tweet everything at once)
  3. Append the new records to the queue of sites to tweet.
  4. Tweet.

So that’s it (generally).  I’ve been working in R for a while now, so I have a general sense of how these things might happen. The thing is, these same mechanics translate to other languages as well. The hardest thing about programming (in my opinion) is figuring out how the program ought to flow. Everything else is just window dressing. Once you get more established with a programming language you’ll learn the subtleties of the language, but for hack-y programming, you should be able to get the hang of it regardless of your language background.

As evidence, Neotomabot. The code’s all there, I spent a day figuring out how to program it in Python. But to help myself out I planned it all first using long-hand notes, and then hacked it out using Google, StackOverflow and the Python manual.  Regardless, it’s the flow control that’s key. With my experience in R I’ve learned how “for” loops work, I know about “while” loops, I know try-catch methods exist and I know I need to read JSON files and push out to Twitter. Given that, I can map out a program and then write the code, and that gives us Neotomabot:

All the code is available on the GitHub repository here, except for the OAuth handles, but you can learn more about that aspect from this tutorial: How to Write a Twitter Bot. I found it very useful for getting started.  There is also a twittR, for R, there are several good tutorials for the package available (here, and here).

So that’s it.  You don’t need to worry about picking the wrong language. Learning the basics of any language, and how to map out the solution to a problem is the key.  Focus on these and you should be able to shift when needed.

The long tail of under-representation

I am by no means an expert on the subject of under-representation in the sciences.  There are some excellent academic bloggers who have done some great work in discussing issues around race and gender in academia (including DNLee, the tweeters at and using #BlackandSTEM – EDIT: this is an amazing post by Dr. Chanda Prescod-Weinstein over at medium.com).  This post is intended to highlight what I’ve observed and experienced over the past year or so, with a specific observation surrounding the EarthCube Early Career Travel Grant.

The issue of diversity is tricky in academia, because diversity means different things to different people.  In a phone interview I once asked what sort of supports the department had to increase diversity and was told that they had a number of women on faculty. Period.  Um . . . I guess that’s one aspect of diversity, but if that’s the end of it, then there are some problems that need to be addressed.

The National Sciences Foundation has specifically addressed diversity in the Geosciences with a number of initiatives, for example the “Opportunities for Enhancing Diversity in the Geosciences (OEDG) Program” which (as far as I know) lapsed in 2012, and the subsequent report “Strategic Framework for Education and Diversity, Facilities, International Activities, and Data and Informatics in the Geosciences” (PDF).  Among disciplines supported through the NSF the geosciences is one of the least diverse, and this problem needs to be addressed.

As part of the EarthCube Engagement Team I drafted the EarthCube Early Researcher Travel Grant along with my Team colleagues, who are all fantastic and engaging people and great scientists (you should join our team, check out the link).  In providing a larger pool of money for EC researchers from under-represented backgrounds we are recognizing that the causes of underrepresentation are not always overt, but are often structural, related to a lack of funding for travel (in this case), and, correlated to that, a lack of visibility for the research products that these researchers might be developing.

This brings up the secondary issue: Who is part of an under-represented group in the geosciences?

It should be clear that the NSF Geo does value diversity, and has explicitly laid out the groups it considers underrepresented:

“women, minorities (African-Americans, Hispanics, Native Americans, Alaska Natives, Native Hawaiians and other Pacific Islanders), and persons with disabilities”

But this is not an exhaustive list.  If you’ve read Tenure She Wrote with any regularity you’ll have come across some of the posts by the author dualitea (see here, and here for some examples).  For me many of these posts have been eye-opening, and indicate the much broader set of individuals that are not well represented in the sciences, let alone the geosciences.  Along with trans academics we can include the broader LGBT community, potentially invisible, but also underrepresented.  In addition acclimatrix and sarcozona have posted about economic background and academia (respectively here and here).  There are obviously many groups of under-represented scholars that lie outside the defined bounds.  Even thinking about Terry McGlynn’s work on Small Pond Science (e.g., here and here), we could potentially include students from smaller universities.  Limited by lack of funding and opportunity, and in need of greater, and more flexible support from non-traditional sources (such as the EarthCube funding).

So, I bring this all up in an effort to answer a question we received about the Early Career Travel Grants: “What is an under-represented group”?

In our proposal we specifically phrased this portion of the text to read:

Individuals that self-identify as part of an underrepresented group within the geosciences may apply for a $1000 travel grant.

In response to the question as posed we have added further text to the grant:

Applicants are asked to self-report if they constitute as an underrepresented minority. This is to allow for a greater breadth in determining what constitutes diversity within the geosciences, but also to allow applicants to empower themselves by clarifying how this proposal and funding will help further the goals of a broad and diverse geosciences discipline.

I think this does two things.  It leaves self-definition open, and provides an opportunity for people to use their own voice to define the challenges they face in a non-judgemental way.  We’ve pointed to resources individuals can use to help define what it means to be under-represented in the geosciences, and, hopefully, provided a way to develop a voice around their place in academia and the geosciences.

Does this approach open the possibility that the grant will be abused by people who try to justify ‘under-representation’ that doesn’t exist: “White left-handed males” (my particular group)?  Maybe, but I’d rather see people get the money to break down barriers, than generate another barrier to access for people who really need it.

This is a long post, but I thought that it might be interesting for people, and I’m genuinely curious how people feel about this, so please chime in in the comments.

If you know anyone who might be interested in this grant, please forward the link on to them, and if you know of any networks we can use to advertise this and future funding opportunities more widely please let us know!

Announcing the EarthCube Early Career Researcher Travel Grant

I feel like I’ve been making lots of funding announcements, but this blog has a slightly special place on the edge of the ecosphere and the geosphere, so it makes sense to broadcast grants that also cross domains, since it’s always fun to get money for the work you do.

logo_earthcube

EarthCube is offering travel grants of $500 for early career researchers (loosely defined) in the geosciences to attend conferences or workshops where they will be presenting material related to the goals of the EarthCube program.  We decided to make decisions four times a year so that researchers could apply closer to the date of the conference.  There’s a total of $15,000 for the year, so please consider applying.  I’ll update later in the year to report some of the metrics we’re using to track the success of the program.

EarthCube related activities could be read fairly broadly, anything intersecting cyberinfrastructure, big data and geosciences related research.  This would include cross-cutting research involving natural hazards, oceanography, hydrology, climate research, paleoecology or paleobiology, with component that leverages new or existing cyberinfrastructure in some way. Looking at EarthCube’s most recent project overview [PDF] is a good starting point to understand how your work fits into EarthCube’s goals.

In addition, this grant has been designed to recognize that under-represented groups within the geosciences may face greater barriers to retention and advancement within the field. In an effort to provide support that would recongize this fact we are offering $1000 to researchers who self-identify as members of an underrepresented group within the geosciences.

FirstClass
All we’re asking for in exchange for a fist-full of moolah (once you’ve applied I mean) is four paragraphs detailling your experience at the conference/workshop and adding some acknowledgement of EathCube’s support in your proposal.  If you want to mention me in particular you can go ahead, but really, it’s not necessary!

If you have any questions feel free to contact me, or use the travel grant email – ec-travel@earthcube.org

Make a Cool $300 (CAD) in Three Easy Steps The CAP Way

Mary Vetter, the Treasurer of the Canadian Association of Palynologists passed this message on through our mailing list:

The Canadian Association of Palynologists Annual Student Research Award was established in 2009 to recognize students’ contributions to palynological research. The award is open to any undergraduate or graduate student who is a member, in good standing, of CAP, regardless of their nationality or country of residence. The intent of the research award is to support student research with a strong palynological component. The award consists of a three-year membership in the Association and $300 CDN, to be put toward some aspect of the student’s research.

The application should consist of: 1) a one-page statement outlining the nature of the research project, its scientific importance, the approximate timeline to completion of the project, and the aspect of the research the funds would be directed toward; (2) a CV; and, (3) a letter of support from the student’s supervisor.

Applications may be submitted in French or English and should be submitted by email. Completed applications are due by March 15, 2015.

Submit applications by e-mail to Dr Francine McCarthy, CAP President (fmccarthy[at]brocku[dot]ca)

Note: Only one award will be given per year, and there will be no limit to the number of times a student can submit an application.

Joining the Canadian Association of Palynologists is fairly straightforward, you can get an application here, and you don’t even need to be a Canadian. With membership you get the twice yearly newsletter, an opportunity to join us at our annual meetings and the chance to join a small, but friendly group of researchers who are interested in all things small, organic walled and fossilized.

If you know any students who might be interested please pass this along. Thanks!

PalEON has a video

In keeping with the theme of coring pictures I wanted to share PalEON’s new video, produced by the Environmental Change Initiative at the University of Notre Dame.  It does a good job of explaining what PalEON does and what we’re all about.  There’s also a nice sequence, starting at about 2:40s in where you get to see a “frozen finger” corer in action.  We break up dry ice, create a slurry with alcohol and then drop it into the lake into the lake sediment.

Once the sediment has frozen to the sides of the corer (about 10 – 15 minutes) we bring the corer up and remove the slabs of ice from the sides, keeping track of position, dimensions and orientation so that we can piece it back together.  I’m on the platform with Jason McLachlan and Steve Jackson.

There’s a great section in there about the sociology of the PalEON project as well, although it’s brief.  So take a second and watch the video, it’s great!

The advantages of taking a chance with a new journal – OpenQuaternary

Full disclosure: I’m on the editorial board of Open Quaternary and also manage the blog, but I am not an Editor in Chief and have attempted to ensure that my role as an author and my role as an editor did not conflict.

Figure 1.  Neotoma and R together at last!
Figure 1. Neotoma and R together at last!

We (myself, Andria Dawson, Gavin L. SimpsonEric GrimmKarthik Ram, Russ Graham and Jack Williams) have a paper in press at a new journal called Open Quaternary.  The paper documents an R package that we developed in collaboration with rOpenSci to access and manipulate data from the Neotoma Paleoecological Database.  In part the project started because of the needs of the PalEON project.  We needed a dynamic way to access pollen data from Neotoma, so that analysis products could be updated as new data entered the database.  We also wanted to exploit the new API developed by Brian Bills and Michael Anderson at Penn State’s Center for Environmental Informatics.

There are lots of thoughts about where to submit journal articles.  Nature’s Research Highlights has a nice summary about a new article in PLoS One (Salinas and Munch, 2015) that looks to identify optimum journals for submission, and Dynamic Ecology discussed the point back in 2013, a post that drew considerable attention (here, here, and here, among others).  When we thought about where to submit I made the conscious choice to choose an Open Source journal. I chose Open Quaternary partly because I’m on the editorial board, but also because I believe that domain specific journals are still a critical part of the publishing landscape, and because I believe in Open Access publishing.

The downside of this decision was that (1) the journal is new, so there’s a risk that people don’t know about it, and it’s less ‘discoverable'; (2) even though it’s supported by an established publishing house (Ubiquity Press) it will not obtain an impact factor until it’s relatively well established.  Although it’s important to argue that impact factors should not make a difference, it’s hard not to believe that they do make a difference.

Figure 2.  When code looks crummy it's not usable.  This has since been fixed.
Figure 2. When code looks crummy it’s not usable. This has since been fixed.

That said, I’m willing to invest in my future and the future of the discipline (hopefully!), and we’ve already seen a clear advantage of investing in Open Quaternary.  During the revision of our proofs we noticed that the journal’s two column format wasn’t well suited the the blocks of code that we presented to illustrate examples in our paper.  We also lost the nice color syntax highlighting that pandoc offers when it renders RMarkdown documents (see examples in our paper’s markdown file).  With the help of the journal’s Publishing Assistant Paige MacKay, Editor in Chief Victoria Herridge and my co-authors we were able to get the journal to publish the article in a single column format, with syntax highlighting supported using highlight.js.

I may not have a paper in Nature, Science or Cell (the other obvious option for this paper /s) but by contributing to the early stages of a new open access publishing platform I was able to change the standards and make future contributions more readable and make sure that my own paper is accessible, readable and that the technical solution we present is easily implemented.

I think that’s a win.  The first issue of Open Quaternary should be out in March, until then you can check out our GitHub repository or the PDF as submitted (compleate with typoes).

Cross-scale ecology at the Ecological Society of America Meeting in Baltimore!

Our Organized Oral Session has been approved and a date has been assigned.  ESA 2015 is getting closer evry day (abstract deadline is coming up on February 26th!), and with it the centennial celebration of the Ecological Society of America.  We’ve managed to recruit a great group of speakers to talk about ecological research that crosses scales of time, rather than space.  Many of these studies share approaches with what we generally consider to be ‘cross-scale’ ecology, which tends to be spatially focused, but the must also deal with the additional complexity of temporal uncertainty and changing relationships between communities, climate, biogeochemical cycling and disturbance at decadal, centennial and millennial scale.

Paleoecological patterns, ecological processes, modeled scenarios: Crossing scales to understand an uncertain future” will be held on the afternoon of Wednesday, August 12, 2015, from 1:30 PM – 5:00 PM.  We have a great line up of speakers confirmed, please remember to add us to your ESA schedule!