The news of Twitter’s termination of our data licensing contract will mean disruption for hundreds of companies. For most businesses, this will not just be a case of switching from one supplier to another, over 80% of our customers leverage capabilities that do not exist in Gnip.
This blog post is for those of you who wish to continue using Twitter data. It highlights the main differences that need to be taken into consideration when planning a transition to Gnip. It will also help you identify the features that you will need to deprecate in your own product as there are no workarounds possible.
A fundamentally different approach: Gnip data licensing vs DataSift data processing
A fundamental question to answer is: “If DataSift and Gnip both have the Twitter firehose, how come 80% of DataSift customers use unique capabilities?”
As any developer knows, to extract insights from the 500 million Tweets-a-day is no simple task. The basic steps in collecting and preparing data for analysis are:
Data Extraction/Filtering: Each Tweet is 140 characters of unstructured text. 30% contains links to content on other sites. How do you decide which data is relevant for your analysis? Sifting the data you want from the data you don’t is a text-mining and filtering problem.
Data Enrichment?Interpretation: How do you interpret text within each Tweet to extract its meaning? For example, understanding the sentiment, topic or intent that’s expressed within a single Tweet. Without this, all you can do is count up vanity metrics on the number of times that a brand name was mentioned. Solving this is a text-analysis problem.
Data Delivery: How do I continually deliver large volumes of the enriched, filtered data into my own platform, ready for further analysis? Given the volumes of data being delivered in real-time, this is a hard problem to solve. Developers want to know that data is guaranteed for delivery, buffered if there is a problem in their own infrastructure, and can easily be mapped to the target database schema they want to receive data in.
Data Licensing: Finally, you pay for the data you received from the firehose. This is the final transition that takes place at the end of the month. Pay for what you received, at the Twitter rate of $0.10 for every 1,000 Tweets.
The focus for DataSift has always been to provide an integrated platform to do the “heavy lifting” across all these areas, enabling developers to focus on building insights, not infrastructure. In contrast Gnip has focused on providing simple data extraction capabilities, and data licensing. 80% of our customers will have to build new infrastructure.
Capabilities you’ll lose when transitioning
We’ve written a more detailed developer blog post to cover this in more detail, but at a summary level, here is a checklist to get your started in thinking about a transition plan. The goal here is to highlight the features in DataSift and the Gnip gap.
When accessing historic data with DataSift you have access to the same augmentation, filtering, classification and delivery features as when accessing real-time streams. Our platform provides a consistent experience across historic and real-time data access.
Accessing historic data through Gnip has the same limitations as their real-time service for augmentation, filtering and classification options. Also, results are returned as sets of raw files leaving you to handle integration into your application, rather than benefit from the seamless delivery features provided by DataSift push connectors.
Closing the gaps
We remain committed to enabling our customers gain insights from the universe of social data sources. However, our options to assist customers in closing these gaps are limited, especially given the short deadline of August 13th for transitioning your application to Gnip. We are evaluating options of how we can best assist and will post an update on this in the coming days.