The long tail of under-representation

I am by no means an expert on the subject of under-representation in the sciences.  There are some excellent academic bloggers who have done some great work in discussing issues around race and gender in academia (including DNLee, the tweeters at and using #BlackandSTEM and others described here).  This post is intended to highlight what I’ve observed and experienced over the past year or so, with a specific observation surrounding the EarthCube Early Career Travel Grant.

The issue of diversity is tricky in academia, because diversity means different things to different people.  In a phone interview I once asked what sort of supports the department had to increase diversity and was told that they had a number of women on faculty. Period.  Um . . . I guess that’s one aspect of diversity, but if that’s the end of it, then there are some problems that need to be addressed.

The National Sciences Foundation has specifically addressed diversity in the Geosciences with a number of initiatives, for example the “Opportunities for Enhancing Diversity in the Geosciences (OEDG) Program” which (as far as I know) lapsed in 2012, and the subsequent report “Strategic Framework for Education and Diversity, Facilities, International Activities, and Data and Informatics in the Geosciences” (PDF).  Among disciplines supported through the NSF the geosciences is one of the least diverse, and this problem needs to be addressed.

As part of the EarthCube Engagement Team I drafted the EarthCube Early Researcher Travel Grant along with my Team colleagues, who are all fantastic and engaging people and great scientists (you should join our team, check out the link).  In providing a larger pool of money for EC researchers from under-represented backgrounds we are recognizing that the causes of underrepresentation are not always overt, but are often structural, related to a lack of funding for travel (in this case), and, correlated to that, a lack of visibility for the research products that these researchers might be developing.

This brings up the secondary issue: Who is part of an under-represented group in the geosciences?

It should be clear that the NSF Geo does value diversity, and has explicitly laid out the groups it considers underrepresented:

“women, minorities (African-Americans, Hispanics, Native Americans, Alaska Natives, Native Hawaiians and other Pacific Islanders), and persons with disabilities”

But this is not an exhaustive list.  If you’ve read Tenure She Wrote with any regularity you’ll have come across some of the posts by the author dualitea (see here, and here for some examples).  For me many of these posts have been eye-opening, and indicate the much broader set of individuals that are not well represented in the sciences, let alone the geosciences.  Along with trans academics we can include the broader LGBT community, potentially invisible, but also underrepresented.  In addition acclimatrix and sarcozona have posted about economic background and academia (respectively here and here).  There are obviously many groups of under-represented scholars that lie outside the defined bounds.  Even thinking about Terry McGlynn’s work on Small Pond Science (e.g., here and here), we could potentially include students from smaller universities.  Limited by lack of funding and opportunity, and in need of greater, and more flexible support from non-traditional sources (such as the EarthCube funding).

So, I bring this all up in an effort to answer a question we received about the Early Career Travel Grants: “What is an under-represented group”?

In our proposal we specifically phrased this portion of the text to read:

Individuals that self-identify as part of an underrepresented group within the geosciences may apply for a $1000 travel grant.

In response to the question as posed we have added further text to the grant:

Applicants are asked to self-report if they constitute as an underrepresented minority. This is to allow for a greater breadth in determining what constitutes diversity within the geosciences, but also to allow applicants to empower themselves by clarifying how this proposal and funding will help further the goals of a broad and diverse geosciences discipline.

I think this does two things.  It leaves self-definition open, and provides an opportunity for people to use their own voice to define the challenges they face in a non-judgemental way.  We’ve pointed to resources individuals can use to help define what it means to be under-represented in the geosciences, and, hopefully, provided a way to develop a voice around their place in academia and the geosciences.

Does this approach open the possibility that the grant will be abused by people who try to justify ‘under-representation’ that doesn’t exist: “White left-handed males” (my particular group)?  Maybe, but I’d rather see people get the money to break down barriers, than generate another barrier to access for people who really need it.

This is a long post, but I thought that it might be interesting for people, and I’m genuinely curious how people feel about this, so please chime in in the comments.

If you know anyone who might be interested in this grant, please forward the link on to them, and if you know of any networks we can use to advertise this and future funding opportunities more widely please let us know!

Announcing the EarthCube Early Career Researcher Travel Grant

I feel like I’ve been making lots of funding announcements, but this blog has a slightly special place on the edge of the ecosphere and the geosphere, so it makes sense to broadcast grants that also cross domains, since it’s always fun to get money for the work you do.

logo_earthcube

EarthCube is offering travel grants of $500 for early career researchers (loosely defined) in the geosciences to attend conferences or workshops where they will be presenting material related to the goals of the EarthCube program.  We decided to make decisions four times a year so that researchers could apply closer to the date of the conference.  There’s a total of $15,000 for the year, so please consider applying.  I’ll update later in the year to report some of the metrics we’re using to track the success of the program.

EarthCube related activities could be read fairly broadly, anything intersecting cyberinfrastructure, big data and geosciences related research.  This would include cross-cutting research involving natural hazards, oceanography, hydrology, climate research, paleoecology or paleobiology, with component that leverages new or existing cyberinfrastructure in some way. Looking at EarthCube’s most recent project overview [PDF] is a good starting point to understand how your work fits into EarthCube’s goals.

In addition, this grant has been designed to recognize that under-represented groups within the geosciences may face greater barriers to retention and advancement within the field. In an effort to provide support that would recongize this fact we are offering $1000 to researchers who self-identify as members of an underrepresented group within the geosciences.

FirstClass
All we’re asking for in exchange for a fist-full of moolah (once you’ve applied I mean) is four paragraphs detailling your experience at the conference/workshop and adding some acknowledgement of EathCube’s support in your proposal.  If you want to mention me in particular you can go ahead, but really, it’s not necessary!

If you have any questions feel free to contact me, or use the travel grant email – ec-travel@earthcube.org

Make a Cool $300 (CAD) in Three Easy Steps The CAP Way

Mary Vetter, the Treasurer of the Canadian Association of Palynologists passed this message on through our mailing list:

The Canadian Association of Palynologists Annual Student Research Award was established in 2009 to recognize students’ contributions to palynological research. The award is open to any undergraduate or graduate student who is a member, in good standing, of CAP, regardless of their nationality or country of residence. The intent of the research award is to support student research with a strong palynological component. The award consists of a three-year membership in the Association and $300 CDN, to be put toward some aspect of the student’s research.

The application should consist of: 1) a one-page statement outlining the nature of the research project, its scientific importance, the approximate timeline to completion of the project, and the aspect of the research the funds would be directed toward; (2) a CV; and, (3) a letter of support from the student’s supervisor.

Applications may be submitted in French or English and should be submitted by email. Completed applications are due by March 15, 2015.

Submit applications by e-mail to Dr Francine McCarthy, CAP President (fmccarthy[at]brocku[dot]ca)

Note: Only one award will be given per year, and there will be no limit to the number of times a student can submit an application.

Joining the Canadian Association of Palynologists is fairly straightforward, you can get an application here, and you don’t even need to be a Canadian. With membership you get the twice yearly newsletter, an opportunity to join us at our annual meetings and the chance to join a small, but friendly group of researchers who are interested in all things small, organic walled and fossilized.

If you know any students who might be interested please pass this along. Thanks!

PalEON has a video

In keeping with the theme of coring pictures I wanted to share PalEON’s new video, produced by the Environmental Change Initiative at the University of Notre Dame.  It does a good job of explaining what PalEON does and what we’re all about.  There’s also a nice sequence, starting at about 2:40s in where you get to see a “frozen finger” corer in action.  We break up dry ice, create a slurry with alcohol and then drop it into the lake into the lake sediment.

Once the sediment has frozen to the sides of the corer (about 10 – 15 minutes) we bring the corer up and remove the slabs of ice from the sides, keeping track of position, dimensions and orientation so that we can piece it back together.  I’m on the platform with Jason McLachlan and Steve Jackson.

There’s a great section in there about the sociology of the PalEON project as well, although it’s brief.  So take a second and watch the video, it’s great!

The advantages of taking a chance with a new journal – OpenQuaternary

Full disclosure: I’m on the editorial board of Open Quaternary and also manage the blog, but I am not an Editor in Chief and have attempted to ensure that my role as an author and my role as an editor did not conflict.

Figure 1.  Neotoma and R together at last!
Figure 1. Neotoma and R together at last!

We (myself, Andria Dawson, Gavin L. SimpsonEric GrimmKarthik Ram, Russ Graham and Jack Williams) have a paper in press at a new journal called Open Quaternary.  The paper documents an R package that we developed in collaboration with rOpenSci to access and manipulate data from the Neotoma Paleoecological Database.  In part the project started because of the needs of the PalEON project.  We needed a dynamic way to access pollen data from Neotoma, so that analysis products could be updated as new data entered the database.  We also wanted to exploit the new API developed by Brian Bills and Michael Anderson at Penn State’s Center for Environmental Informatics.

There are lots of thoughts about where to submit journal articles.  Nature’s Research Highlights has a nice summary about a new article in PLoS One (Salinas and Munch, 2015) that looks to identify optimum journals for submission, and Dynamic Ecology discussed the point back in 2013, a post that drew considerable attention (here, here, and here, among others).  When we thought about where to submit I made the conscious choice to choose an Open Source journal. I chose Open Quaternary partly because I’m on the editorial board, but also because I believe that domain specific journals are still a critical part of the publishing landscape, and because I believe in Open Access publishing.

The downside of this decision was that (1) the journal is new, so there’s a risk that people don’t know about it, and it’s less ‘discoverable'; (2) even though it’s supported by an established publishing house (Ubiquity Press) it will not obtain an impact factor until it’s relatively well established.  Although it’s important to argue that impact factors should not make a difference, it’s hard not to believe that they do make a difference.

Figure 2.  When code looks crummy it's not usable.  This has since been fixed.
Figure 2. When code looks crummy it’s not usable. This has since been fixed.

That said, I’m willing to invest in my future and the future of the discipline (hopefully!), and we’ve already seen a clear advantage of investing in Open Quaternary.  During the revision of our proofs we noticed that the journal’s two column format wasn’t well suited the the blocks of code that we presented to illustrate examples in our paper.  We also lost the nice color syntax highlighting that pandoc offers when it renders RMarkdown documents (see examples in our paper’s markdown file).  With the help of the journal’s Publishing Assistant Paige MacKay, Editor in Chief Victoria Herridge and my co-authors we were able to get the journal to publish the article in a single column format, with syntax highlighting supported using highlight.js.

I may not have a paper in Nature, Science or Cell (the other obvious option for this paper /s) but by contributing to the early stages of a new open access publishing platform I was able to change the standards and make future contributions more readable and make sure that my own paper is accessible, readable and that the technical solution we present is easily implemented.

I think that’s a win.  The first issue of Open Quaternary should be out in March, until then you can check out our GitHub repository or the PDF as submitted (compleate with typoes).

Cross-scale ecology at the Ecological Society of America Meeting in Baltimore!

Our Organized Oral Session has been approved and a date has been assigned.  ESA 2015 is getting closer evry day (abstract deadline is coming up on February 26th!), and with it the centennial celebration of the Ecological Society of America.  We’ve managed to recruit a great group of speakers to talk about ecological research that crosses scales of time, rather than space.  Many of these studies share approaches with what we generally consider to be ‘cross-scale’ ecology, which tends to be spatially focused, but the must also deal with the additional complexity of temporal uncertainty and changing relationships between communities, climate, biogeochemical cycling and disturbance at decadal, centennial and millennial scale.

Paleoecological patterns, ecological processes, modeled scenarios: Crossing scales to understand an uncertain future” will be held on the afternoon of Wednesday, August 12, 2015, from 1:30 PM – 5:00 PM.  We have a great line up of speakers confirmed, please remember to add us to your ESA schedule!

Building your network using ORCiD and ROpenSci

Our neotoma package is part of the ROpenSci network of packages.  Wrangling data structures and learning some of the tricks we’ve implemented wouldn’t have been possible without help from them throughout the coding process.  Recently Scott Chamberlain posted some code for an R package to interface with ORCiD, the rORCiD package.

To digress for a second, the neotoma package started out as rNeotoma, but I ditched the ‘r’ because, well, just because.  I’ve been second guessing myself ever since, especially as it became more and more apparent that, in writing proposals and talking about the package and the database I’ve basically created a muddle.  Who knows, maybe we’ll go back to rNeotoma when we push up to CRAN.  Point being, stick an R in it so that you don’t have to keep clarifying the differences.

So, back on point.  A little while ago I posted a network diagram culled from my cv using a bibtex parser in R (the bibtex package by Roman François).  That’s kind of fun – obviously worth blogging about – and I stuck a newer version into a job application, but I’ve really been curious about what it would look like if I went out to the second order, what does it look like when we combine my publication network with the networks of my collaborators.

Figure 1.  A second order co-author network generated using R and ORCiD's public API.
Figure 1. A second order co-author network generated using R and ORCiD’s public API.  Because we’re using the API we can keep re-running this code over and over again and it will fill in as more people sign up to get ORCiDs.

Enter ORCiD.  For those of you not familiar, ORCiD provides a unique identity code to an individual researcher.  The researcher can then identify all the research products they may have published and link these to their ID.  It’s effectively a DOI for the individual.  Sign up and you are part of the Internet of Things.  In a lot of ways this is very exciting.  The extent to which the ORCiDs can be linked to other objects will be the real test for their staying power.  And even there, it’s not so much whether the IDs can be linked, they’re unique identifiers so they’re easy to use, it’s whether other projects, institutions and data repositories will create a space for ORCiDs so that the can be linked across a web of research products.

Given the number of times I’ve been asked to add an ORCiD to an online profile or account it seems like people are prepared to invest in ORCiD for the long haul, which is exciting, and provides new opportunities for data analysis and for building research networks.

So, lets see what we can do with ORCiD and Scott’s rorcid package. This code is all available in a GitHub repository so you can modify it, fork, push or pull as you like:

The idea is to start with a single ORCiD, mine in this case (0000-0002-2700-4605).  With the ORCiD we then discover all of the research products associated with the ID.  Each research product with a DOI can be linked back to each of the ORCiDs registered for coauthors using the ORCiD API.  It is possible to find all co-authors by parsing some of the bibtex files associated with the structured data, but for this exercise I’m just going to stick with co-authors with ORCiDs.

So, for each published article we get the DOI, find all co-authors on each work who has an ORCiD, and then track down each of their publications and co-authors.  If you’re interested you can go further down the wormhole by coding this as a recursive function.  I thought about it but since this was basically a lark I figured I’d think about it later, or leave it up to someone to add to the existing repository (feel free to fork & modify).

In the end I coded this all up and plotted using the igraph package (I used network for my last graph, but wanted to try out igraph because it’s got some fun interactive tools:

library(devtools)
install_github('ropensci/rorcid')

You need devtools to be able to install the rOrcid package from the rOpenSci GitHub repository

library(rorcid)
library(igraph)

# The idea is to go into a user and get all their papers, 
# and all the papers of people they've published with:

simon.record <- orcid_id(orcid = '0000-0002-2700-4605', 
                         profile="works")

This gives us an ‘orcid’ object, returned using the ORCiD Public API. Once we ave the object we can go in and pull out all the DOIs for each of my research products that are registered with ORCID.

get_doi <- function(x){
  #  This pulls the DOIs out of the ORCiD record:
  list.x <- x$'work-external-identifiers.work-external-identifier'
  
  #  We have to catch a few objects with NULL DOI information:
  do.call(rbind.data.frame,lapply(list.x, function(x){
      if(length(x) == 0 | (!'DOI' %in% x[,1])){
        data.frame(value=NA)
      } else{
        data.frame(value = x[which(x[,1] %in% 'DOI'),2])
      }
    }))
}

get_papers <- function(x){
  all.papers <- x[[1]]$works # this is where the papers are.
  papers <- data.frame(title = all.papers$'work-title.title.value',
                       doi   = get_doi(all.papers))
  
  paper.doi <- lapply(1:nrow(papers), function(x){
    if(!is.na(papers[x,2]))return(orcid_doi(dois = papers[x,2], fuzzy = FALSE))
    # sometimes there's no DOI
    # if that's the case then just return NA:
    return(NA)
  })

  your.papers <- lapply(1:length(paper.doi), function(x){
      if(is.na(paper.doi[[x]])){
        data.frame(doi=NA, orcid=NA, name=NA)
      } else {
        data.frame(doi = papers[x,2],
                   orcid = paper.doi[[x]][[1]]$data$'orcid-identifier.path',
                   name = paste(paper.doi[[x]][[1]]$data$'personal-details.given-names.value',
                                paper.doi[[x]][[1]]$data$'personal-details.family-name.value', 
                                sep = ' '),
                   stringsAsFactors = FALSE)
      }})
  do.call(rbind.data.frame, your.papers)
  
}

So now we’ve got the functions, we’re going to get all my papers, make a list of the unique ORCIDs of my colleagues and then get all of their papers using the same ‘get_papers’ function. It’s a bit sloppy I think, but I wanted to try to avoid duplicate calls to the API since my internet connection was kind of crummy.

simons <- get_papers(simon.record)

unique.orcids <- unique(simons$orcid)

all.colleagues <- list()

for(i in 1:length(unique.orcids)){
  all.colleagues[[i]] <- get_papers(orcid_id(orcid = unique.orcids[i], profile="works"))
}

So now we’ve got a list with a data.frame for each author that has three columns, the DOI, the ORCID and their name. We want to reduce this to a single data.frame and then fill a square matrix (each row and column represents an author) where each row x column intersection represents co-authorship.


all.df <- do.call(rbind.data.frame, all.colleagues)
all.df <- na.omit(all.df[!duplicated(all.df),])

all.pairs <- matrix(ncol = length(unique(all.df$name)),
                    nrow = length(unique(all.df$name)),
                    dimnames = list(unique(all.df$name),unique(all.df$name)), 0)

unique.dois <- unique(as.character(all.df$doi))

for(i in 1:length(unique.dois)){
  doi <- unique.dois[i]
  
  all.pairs[all.df$name[all.df$doi %in% doi],all.df$name[all.df$doi %in% doi]] <- 
    all.pairs[all.df$name[all.df$doi %in% doi],all.df$name[all.df$doi %in% doi]] + 1

}

all.pairs <- all.pairs[rowSums(all.pairs)>0, colSums(all.pairs)>0]

diag(all.pairs) <- 0

Again, probably some lazy coding in the ‘for’ loop, but the point is that each row and column has a dimname representing each author, so row 1 is ‘Simon Goring’ and column 1 is also ‘Simon Goring’. All we’re doing is incrementing the value for the cell that intersects co-authors, where names are pulled from all individuals associated with each unique DOI. We end by plotting the whole thing out:


author.adj <- graph.adjacency(all.pairs, mode = 'undirected', weighted = TRUE)
#  Plot so that the width of the lines connecting the nodes reflects the
#  number of papers co-authored by both individuals.
#  This is Figure 1 of this blog post.
plot(author.adj, vertex.label.cex = 0.8, edge.width = E(author.adj)$weight)