Bringing Big Data Techniques to Human-Generated Data

Zuzanna Pasierbinska-Wilson

Humans are social creatures. That might even be truer when we go online. Consider the following recent (beginning portion of 2014) statistics about the most popular social network and blog platforms and their memberships:

  • Twitter has 560 million registered users, and over half of them are active, and at least as of a couple of years ago, there were 200 million tweets sent every single day.
  • WordPress powers 20% of the top global sites and adds 40 million posts per month. While Disqus’ commenting engine claims 1 billion monthly unique users.
  • Finally, we have Facebook with more than one billion (with a capital B) active users. Instagram, owned by Facebook, has 150 million active users.

And all of those users — those humans — are posting about their desires, their interactions, their interests, their whims — and their experiences, both with your company, with your competitors, and with other businesses that intersect with their desires but from whom you could learn. The result is a tremendous pool of what DataSift calls Human Data–the entire spectrum of human conversations and human-generated content—which needs to be analyzed and teased to get insights on how to better serve your customers and grow your business’ revenue. Make no mistake: human-generated data is itself Big Data.

And the end game of Big Data is almost identical to the endgame of human-generated data. The challenge, of course, is how to turn all of this data into insights that can allow you to predict how your current and future customers will behave, and how best to reach them to influence that behavior, with accountability on both on the money you spend purchasing that opportunity to influence and the return on that investment that you can measure as being effective.

What are some techniques that can be used to accomplish this objective with this Big Human Data?

Ensure you have real-time data.
Unlike data warehouses, which have data that is sometimes many years old, human-generated data becomes stale very, very fast, and so to really tease insights from social happenings, you need to have a way to turn on the firehouse as it were and drink from it at a consistently high capacity.

Have a good descriptive set of metadata.
Good metadata—the “fingerprints” that show where data is from and describe that data and its properties—helps to sort the wheat from the chaff and enables thousands of miniature comparisons to be made in a second by a machine. I wrote about metadata previously, and commented that metadata can absolutely help overlay context and patterns to a set of unstructured data—whether it is traditional Big Data, Human Data, or both.

Perform sentiment analysis.
Sentiment is the measure of whether the feeling embodied in some text is positive, neutral or negative. There are some other nuances that can be hinted at in text, but in general, sentiment is the measure of the subjective and it can tell a lot about a person, his or her demeanor, and in particular what he or she tends to talk about, write about, and do can very much be affected by his or her sentiment. It is a powerful tool in the Human Data analytics arsenal, although it is not unique to human-generated data.

Build correlations between data from different sources.
Human-generated data covers all types of content created online. Is this copious data linked with anything else? Is there a unified way to view all of this data, both in real time and also over time? Much of Big Data is about mapping connections, and human-generated data proves to be no exception to this. Having a unified view of Human Data helps to unlock meaning and insight.

Analyze all of the content you monitor for relationships.
Twitter and Facebook aren’t the only places you can find human-generated data. You can derive value from looking at audio and video, text, and images that people leave on blogs, within news content, and on other social networks besides the Big Two. You can understand more about your customers: their intent, likes and dislikes, topics they discuss and content they reference.

If you want to dive into any of the above techniques a bit deeper, view our recent webcasts.

Zuzanna Pasierbinska-Wilson

Written by Zuzanna Pasierbinska-Wilson

Zuzanna is SVP, Marketing at DataSift. You can follow her @fattypontoonski.

Share This