How to Use DataSift for Brand Monitoring – Historical and Real-Time

7th October 2014 0 Comments

Timely, accurate and cost-effective brand monitoring is a Holy Grail for any organization. But how do you find out about the conversations happening around your brand? Where the content is being shared? By how many users? Or who your audiences are? DataSift provides a powerful means of monitoring your brand: collecting, consuming and analyzing the social data you need to underpin your decision-making.

In this Brand Monitoring use case scenario, we will assume your organization has the following prerequisite components for use in gathering brand insights and building audience profiles:

  • DataSift platform (source of data)
  • Historics preview (filter evaluation)
  • Amazon S3 (data destination)
  • Consolidator shell script (file handling)
  • PostgreSQL with Foreign Data Wrapper (database)
  • Tableau (visual analytics platform)

This DataSift use case will now take you though the end-to-end process for brand monitoring, including:

  • Brand identification
  • CSDL filter definition
  • Historics for preview and data acquisition
  • Configure data destination (Amazon S3)
  • File consolidator
  • Load data files into PostgreSQL
  • Data handling in PostgreSQL
  • Visualizations in Tableau

Brand Identification
The use case is crucial: being both the starting point and the end point of the process. It defines the CSDL filter criteria, what data points are of interest and what to measure within the output.

Historics Preview
Having defined the CSDL, it is recommended you run a Historics Preview to understand the data volume and potential noise. The process uses a 1% sample over the period specified (up to 30 days) and returns useful information, such as interaction counts, interaction types, languages, and even a word cloud so you can identify any potentially noisy terms. To create a Historics Preview open your stream, click the “Historics Preview” button and select the time period you require.

Historics Preview

Configure the Data Destination (Amazon S3)
Before the data can be collected (via a DataSift Historics task), a data destination has to be defined, i.e. where will DataSift deliver the data? Let’s assume it is the Amazon S3 cloud-based file storage system. You can easily create an Amazon AWS account and configure your Amazon S3 account for DataSift here. When configuring the S3 destination, be sure to use the format “JSON_New_Line”, which accelerates file consolidation. The S3 destination in DataSift also allows you to prefix your files.

Historic Data Pull
With the filter created and the data destination configured, the DataSift Historics task can be run. Open the stream and select Historics Query. Select the timeframe you want (<=30 days) and the sources of interest.

Historic Data Pull

File Consolidator
The Amazon S3 data destination will create a new file with each delivery, so the resulting data set is a collection of files. The file consolidator script concentrates all of your individual files into a single file while retaining the new line delimited format. Placing the consolidator script into the same directory as the downloaded S3 files creates one new JSON file. You can find more information here.

Load Data Files into PostgreSQL
There are two reasons for using PostgreSQL in conjunction with the Foreign Data Wrapper:

  • Reading and parsing JSON data
  • UNNEST function

The foreign data wrapper reads and parses JSON data, loading it in to PostgreSQL recognizing arrays and storing them. Once loaded, the power of the UNNEST command can be realized, expanding single line arrays into multiple rows. Translating JSON array items into RDBMS table entrances allows complex arrays that are commonly seen in JSON data—such as hashtags, URLs and mentions—to be accessible via simple SQL. You can install PostgreSQL. The Foreign Data Wrapper documentation can be found here.

The following tables will be created as part of this process:

  • explore_brand
  • explore_mentions
  • explore_hashtags
  • explore_links

The scripts to load data can be found here.

Data Handling in PostgreSQL
The scripts utilize the UNNEST function to expand arrays into rows. So what does UNNEST look like? The following provides an example for illustrative purposes whereby there are multiple hashtags within a single array:

select "interaction.id", "interaction.hashtags"
from explore_brand
where "interaction.id" = '1e41f3b6ff7cad80e074552b8c4787fa' ;

Data Output

Using UNNEST, this can be expanded from one column into multiple rows:

select "interaction.id",
else UNNEST ("interaction.hashtags")
end as "interaction.hashtags"
from explore_brand
where "interaction.id" = '1e41f3b6ff7cad80e074552b8c4787fa';

Data Ouput II

Visualizations in Tableau
With the data schema in place, the data can be explored. Tableau is an ideal solution for data visualizations, as it supports PostgreSQL natively.

Brand sentiment over time
Brand Sentiment Over Time
Brand interaction classification
Tableau Visualization

Assets
All associated scripts and assets including example Tableau dashboards can be downloaded from the GitHub repository.

Share This