Sifting Around: Analyzing “Game of Thrones” Social Conversations

Justin Breucop
9th May 2014 1 Comment

To start things off, I’d like to introduce Sifting Around, a new series where I’ll demonstrate the analytics process I use when exploring a current event from a social data perspective. From writing a CSDL script to creating visualizations, I’ll give you the how and why behind each step and highlight different DataSift enrichments that I found particularly useful. Conclusions drawn here are entirely my own and are guaranteed by no one, nor is any code here supported by myself or DataSift at large. So, without further ado…

Game of Thrones
GOT is hard to get away from. I go online and all my friends post spoilers, plan watching parties, post photos of their themed tattoos, etc. My friends (and maybe my new tattoo) can be very nerdcore about the show, so I started wondering: how do people talk about Game of Thrones online? We’re halfway through the season, so it seems there may be some great data to look at. Enter DataSift, stage left.

Building a Game of Thrones filter
Step 1: Develop the filter and determine the data load
I whipped up some CSDL and then verified the amount of data and noise levels using both Live and Historic Preview.

From the interaction volume chart, I knew to expect about 9 million interactions, which is more than I’d care to struggle with. I decided to limit the results to a 10 percent sample; 880 thousand interactions are more than enough to glean some interesting insights. (As a side note, unless you’re performing deep analytics or machine learning, anything over 2 million is overkill).


Step 2: Add tags for easier analysis

The second piece of the historics preview I utilized was the word cloud:

Game of Thrones dominated the word cloud, which makes sense. But there was also a lot of chatter around the characters. I figured it would be interesting to analyze conversations around the characters, like a social media fight for the Iron Throne. I could’ve tagged records in my database based on characters using SQL statements, but DataSift VEDO lets the platform tag the conversations for me. So I added character and keyword tags to make my processing work easier.


Step 3: Refine your filter based on your results

Probably the hardest piece to building out the CSDL was figuring out the possible permutations of character names, including common misspellings, and navigating around common names (such as Jaime and Jon) in the return statement. The final stream looked like this:
http://datasift.com/essence/lqntpu


Step 4: Export the results for analysis

The next step was setting up my MySQL database and then running a historics query for all of this season’s episodes. I made sure to utilize the tags in an Entity – Attribute – Value model for easy sorting. Then, by connecting the database to Tableau to quickly visualize the data, I did a few baseline views of the data:

Nothing surprising here (aside from the Monday spike for the second episode). Social media tends to focus around buzz, and the first two episodes definitely had the largest game changers. What about spoilers?

One hour after the second episode aired for the first time (7:00 p.m. PST), people talked about spoilers a lot, relative to the general conversations.

Let’s see which characters generated the largest buzz during these episodes. To visualize the conversations about characters, I created bubble charts sized by volume of conversations.

Joffrey seemed to steal a lion’s share of the conversations (pun intended) in the second episode, and then discussion died off steadily after that (spoiler pun doubly intended).

Now for a less standard visualization: how does the percentage share of the conversation change over time for the characters? This is an easier to digest alternative to the previous, essentially:

This is also sorted by overall percentage. Sanza, Arya, Daenerys and Tyrion never stray too far from the limelight. Alas, the evil king dominated season four so far, but it looks like the Mother of Dragons is on the up and up. She must do what queens do!


Key takeaways:
• Sample data if the volume is over a million records
• Use VEDO Tags for easier post process work
• Anticipate common spelling mistakes
• Have fun!

I’ll be back to analyze other trends and keep this series alive. Sound off in the comments if you’d like to see me explore a particular topic.

Justin Breucop

Written by Justin Breucop

Justin Breucop is a Solutions Consultant at DataSift. Connect with Justin on Twitter

  • Dan Grady

    Well done.