From a performance of “the Harlem Shake” by mobs of people in Tokyo to #makeuptransformations by women and men alike, social networks like Tumblr are spreading internet memes and slang around the world at a rate never seen before.
In today’s “Sifting Around”, I’m going to take a look at the term “Bae”. Not the global defense and aeronautics company BAE Systems, mind you. I’m referring to a slang word that Urban Dictionary defines as “Before Anyone Else, or another way to say ‘baby'”.
Bae has been around for a long time, but “Bae” recently took off as a commonly used term around the time Pharrell Williams and Miley Cyrus released a music video titled “Come Get it Bae”, spawning a series of other slang exploration articles. I’m going to look at trends and influencers associated with the tag “Bae” on the popular social network, Tumblr.
Step One: Defining The Stream
Let’s start with the first step of the process: what am I looking for? And the answer is: tumbles tagged with bae and potential derivatives. So my first run with CSDL was tumblr.tags in “bae” to see what would come up. And what came up was definitely not safe for work (sorry, coworkers sitting next to me). There were a lot of tumbles I found relevant or possibly interesting but clogging up my relevant data were these nsfw tumbles.
This is a common occurrence in a data exploration exercise. You have a plan, write some CSDL, but suddenly your data is filled with chaff. The solution is to optimize your CSDL but that is not always a straightforward process. There are a lot of techniques but for this exercise I kept it simple. I did a small data pull for over a few days and looked at the top 50 most frequent tags, which happened to include nsfw towards the top. Tags that were clearly inappropriate were identified to use for exclusion in the CSDL. Kenneth Bae was also a common one, so I went ahead and excluded that as well.
Step Two: Delivering Output
After pulling data for several months, I dumped it in a PostgreSQL Table and explored it with Tableau. The most interesting comparison to make is the volume over time of original content vs. total content. I pulled in reblogged tumbles as well and they appear like any other post, but it’s simply one blog sharing another blogs content. This can create a tumble network, which is cool, but is a better measure of a.) what’s being generated vs. b.) what’s popular. Take a look at the data, including a graph that shows a three day moving average:
And now for the magic question: what are the most commonly associated tags? The results were surprising at first because I explored the difference between original tags vs reblogged tags and found the bae-associated tags were pretty different.
Defining a Trend: Celebrity Baes
Apparently, a lot of people take selfies and believe themselves to be “baes”. However, I found another trend in the analysis. Many Tumblr users post groups or shows as their “baes”, as you can see in the data above. Here’s a translation for those not currently up on Tumblr’s top crushes du jour:
TOP CELEBRITY BAES ON TUMBLR:
1. 5sos: the boy band “5 Seconds of Summer”
2. 1D: One Direction, another popular boy band
3. Tokyo Ghoul: a Japanese manga about a half man/half ghoul
4. Teen Wolf: an MTV show. Fairly self-explanatory concept
5. Nash Grier: the self-proclaimed“King of Vine”/YouTube star
6. Cameron Dallas: another Vine/YouTube star
7. Sebastian Stan: “Captain America” actor
Get Started with Tumblr Data…for Free!
I hope that this showed how easy it is to ask a question, and start exploring the data in only a few minutes. I’ll be back again with more DIY exploration projects (like a Martha Stewart of data, or something like that). In the meantime, you can work with Tumblr data yourself. Just sign up for a DataSift account, and you’ll get a $10 credit.