Explorations in outreach – Creating a Twitter bot for the Neotoma Paleoecological Database.

If you’ve ever been in doubt about whether you chose the right programming language to learn I want to lay those concerns to rest here.

For many scientists, particularly in Biology or the Earth Sciences, there is often a question about whether you should be learning R, Python, Matlab or something else.  Especially when you’re coming into scientific programming in grad school with little prior experience this might seem like a daunting proposal.  You already don’t know anything about anything, and ultimately you wind up learning whatever you’re taught, or whatever your advisor is using and you wonder. . . Is the grass greener over in Python-land? Those figures look nice, if only I had learned R. . . Why did I learn on an expensive closed platform?

I am here to say “Don’t worry about it”, and I want to emphasize that with an example centered around academic outreach:

The Neotoma Paleoecological Database has had an issue for several years now.  We have had a large number of datasets submitted, but very few people could actively upload datasets to the database.  Neotoma is a live database, which means that not only do new datasets get added, but, as new information becomes available (for example, new taxonomic designations for certain species) datasets get updated.  This means that maintaining the database is very time intensive and there has traditionally been a gap between data ingest and data publication.  To make up for this there has been a data “Holding Tank” where individual records have been available, but this wasn’t the best solution.

Fast forward to about a year ago. Eric Grimm at the Illinois State Museum update the software package Tilia to provide greater access to the database to selected data stewards.  Each data type (including insets, pollen, mammal fossils, XRF, ostracodes, lake chemistry) has one or a few stewards who can vet and upload datasets directly to the database using the Tilia platform. This has increased the speed at which datasets have entered Netoma rapidly — over the last month there have been more than 200 new datasets entered — but it’s still hard to get a sense of this as an outsider since people don’t regularly check the database unless they need data from it.

Which brings us to Twitter. Academics have taken to Twitter like academics on a grant .  Buzzfeed has posted a list of 25 twitter feeds for nerds, Science published a somewhat contentious list of scientists to follow, and I’m on twitter, so obviously all the cool kids are there. This led me to think that twitter could be a good platform for publicizing new data uploads to Neotoma.  Now I just needed to learn how.

The process is fairly straightforward:

  1. Figure out what the most recently posted Neotoma datasets are:
    • This is made easier with the Neotoma API, which has a specific method for returning datasets: http://ceiwin10.cei.psu.edu/NDB/RecentUploads?months=1
    • You’ll notice (if you click) that the link returns data in a weird format.  This format is called JSON and it has been seen by many as the successor to XML (see here for more details).
  2. Check it against two files, (1) a file of everything that’s been tweeted already, and (2) a file with everything that needs to be tweeted (since we’re not going to tweet everything at once)
  3. Append the new records to the queue of sites to tweet.
  4. Tweet.

So that’s it (generally).  I’ve been working in R for a while now, so I have a general sense of how these things might happen. The thing is, these same mechanics translate to other languages as well. The hardest thing about programming (in my opinion) is figuring out how the program ought to flow. Everything else is just window dressing. Once you get more established with a programming language you’ll learn the subtleties of the language, but for hack-y programming, you should be able to get the hang of it regardless of your language background.

As evidence, Neotomabot. The code’s all there, I spent a day figuring out how to program it in Python. But to help myself out I planned it all first using long-hand notes, and then hacked it out using Google, StackOverflow and the Python manual.  Regardless, it’s the flow control that’s key. With my experience in R I’ve learned how “for” loops work, I know about “while” loops, I know try-catch methods exist and I know I need to read JSON files and push out to Twitter. Given that, I can map out a program and then write the code, and that gives us Neotomabot:

All the code is available on the GitHub repository here, except for the OAuth handles, but you can learn more about that aspect from this tutorial: How to Write a Twitter Bot. I found it very useful for getting started.  There is also a twittR, for R, there are several good tutorials for the package available (here, and here).

So that’s it.  You don’t need to worry about picking the wrong language. Learning the basics of any language, and how to map out the solution to a problem is the key.  Focus on these and you should be able to shift when needed.

No one reads your blog: Reflections on the middling bottom.

Two weeks ago Terry McGlynn posted reflections about blogging on Small Pond Science, an excellent blog that combines research, teaching reflections and other assorted topics.  Two weeks ago I didn’t post anything.  Three weeks ago I didn’t post anything.  The week before I posted a comment of Alwynne Beaudoin‘s that is great, but wasn’t really mine (although she gave me permission to post it).  The last thing I posted myself was a long primer on using GitHub that I posted six weeks ago. Continue reading No one reads your blog: Reflections on the middling bottom.

Guest post: What skills do you wish you learned? What skill should you impart?

Recently on CAGList, the mailing list for the Canadian Association of Geographers an early career researcher asked established researchers  what kind of training  they wish they had obtained as grad students and post-docs.  Alwynne Beaudoin, adjunct Professor at the University of Alberta, curator of Quaternary Environments at the Royal Alberta Museum and active member of the Canadian Association of Palynologists, posted an excellent reply.

I asked her if it would be okay to post it here (and she’s agreed), because I think it speaks to the heart of what many of us are beginning to realize:  Our ‘hard skills’ training is often excellent, but the soft skills that make our lives much more manageable and enjoyable, and can play a significant role in your career development both inside and outside academia.

Continue reading Guest post: What skills do you wish you learned? What skill should you impart?

If you’re going to solicit papers for a journal, use proper grammar. Lessons for predatory publishers.

Here’s a great solicitation from David Publishing Company, a company on Beall’s list of predatory publishers (I’ve reformatted it so you don’t have to see all the fonts they used):

Dear Goring, Simon J ,

 This is Earth Science and Engineering (ISSN 2159-581X), a new professional journal published across the United States by David Publishing Company, Chicago, IL, USA. We have learned your paper“RELIABLE GRIDDED ESTIMATES OF PRE-SETTLEMENT VEGETATION FOR THE UPPER MIDWEST FROM PUBLIC LANDS SURVEY DATA in the AMQUA 2012 . We are very interested  in your paper . If the paper  has  not  published  in other  journal ,we would like to publish your paper in our journal Earth Science and Engineering.  All your original and  unpublished paper are welcome (although  the paper  has  already  been  published  in  the conference,  it can  also  publish in  our  journal,  becauce there  is  no  ISSN or  ISBN  for the conference). If you have the idea of making  our journal a vehicle for your research  interests, please send electronic version of your papers or books to us through email attachment in MS word format.

Sounds great, but if you can’t reliably edit your own email solicitations then I’m not interested.  Then there’s this at the end:

As an American academic publishing group, we wish to become your friends if we may.

I’m not really sure what that’s about.  Are they friendly because they’re American?  Because we’re both American?  Do they know I’m Canadian, and if they don’t, will that hurt my chances of being their friend?  How do you become friends with a publishing group anyway?  So many questions. . .

Their website leads to more questions.  From the “For Authors” page (blank), to the Survey on the front page (what’s it even about?).  At least they provide me with the option to list myself as an “Academician” when I submit my paper.  That’s the number one reason I’ve never submitted to Nature or Science, no opportunity to list my title as “Academician”.