We had a lab meeting this week where we talked a bit about some of the issues surrounding blogging, in particular we talked about trolls and their annoying trolling. I should be careful here, the term ‘troll’ has evolved a bit over the last few years. My understanding of a troll was generally someone who posted contentious material (inflammatory or offensive) for the purpose of getting a rise out of people or derailing a conversation. The most important part of the early definition was that the person was posting for the purpose of derailing the conversation, and often they did not believe what they were saying. As we move toward what I think of as the newer definition, it’s basically anyone posting inflammatory comments, whether they believe them or not. I’m going to use my second definition of the term troll from now on.
In the case of climate change denial, I’ve had an idea for a while that, while some trolls are genuine jerks who feel the need to vent, some are paid jerks. If that’s the case then it should show up in the IP logs of blogs or emails sent to individuals whose only crime was being a climate scientist. Has anyone done this kind of analysis? I would assume its pretty easy to do. The second thing I’ve been a bit curious about is the possibility that analysis of the troll posts might show some kind of similarities across posts.
Could we actually identify trolls in the wild using word frequencies, or are their posts too short?
Just an idea, I’ve looked at some r packages for text analysis, it might be a fun project. Anyone have a bunch of hate mail?
1. build a corpus of hate mail (do you think Mike Mann would share?)
2. use a package like tm in R to build some clusters, then take a look at the clusters, their strength and their geographic coherence.
3. Somewhere in there you’d have to learn about text mining too. 🙂
On my bike in to work today I came up with some hypotheses:
H0: There is no spatial structure to the hate mail.
H1: The hate mail represents genuine emails from people who are upset that climate science produces the results it does, and these people are representative of the general population in the US, so their structure should be similar to the structure of public opinion on climate change in the USA.
H2: The hate mail represents efforts to simulate grassroots opposition and so the spatial structure should represent the distribution of lobbyist groups that might support industries with vested interests against climate change.
H3: Lobbyist groups may be involved with these emails but they use technology to anonymize their IP addresses and so the addresses will mimic the TOR network.
H4: The final distribution will mimic both H3 and H1 to some degree.
I actually suspect that the spatial distribution will vary based on whether we look at hate mail directly to researchers and annoying posts on blog sites, but . . .
Awesome, who wants in? Who would publish this? How much time would this take away from otherwise productive science?