Processing Twitter Data from Gnip using DataSift

Richard Caudle
21st April 2015 0 Comments

You might have seen the recent announcement  that Twitter has ended their partnership agreement with DataSift, meaning that as of August 13th 2015, you will need to license Twitter data directly from Gnip.

If you are an existing DataSift customer you’ll notice there are many feature gaps you’ll need to consider when moving to Gnip. We outlined these in a previous post.

We know that most of our customers use us to process more than just Twitter data. One big advantage of our platform is that you can augment, filter and classify data consistently across all of your sources.

To help make your transition as smooth as possible, and to help you to continue to use our platform features we’ll soon be releasing a solution that lets you bring data delivered from Gnip’s API into DataSift and allows you to still use the majority of our features going forward. This solution will allow you to post-process Gnip content, you will need to create your filters with Gnip beforehand in order to deliver Tweets to the DataSift platform. DataSift will not be interfacing directly with Gnip.

Open Data Platform – Powerful Data Processing
I’m sure you’re aware that we ingest data from many sources. We can do so because our platform is completely generic in the way that it handles data – allowing data from any source to be processed. Ingesting data from Twitter, Facebook, news feeds or any source is exactly what our platform is designed for.

It’s therefore straight-forward for our platform to ingest the data you receive from Gnip and provide post-processing of the Twitter data. Post-processing the data from Gnip will help you to carry on providing features critical to your application, as you can continue to:

  • Augment data adding link metadata, sentiment and VEDO FOCUS topics
  • Filter data, refining the dataset you receive from Gnip using full CSDL features
  • Classify data, applying tags, scores and machine learning using VEDO
  • Deliver data to your destinations using push connectors

The solution will unfortunately not be able to fill every gap, but it will save you investing a vast amount of custom development time, and will allow you to keep providing many of your key features.

Ingesting Twitter Data
Gnip delivers data to customers by offering a number of APIs. We will allow customers to push data they receive from the Gnip API into DataSift via our API.

ds-gnip-connectorThe process for ingesting data from Gnip will be as follows:

  • Setup rules in your Gnip account using PowerTrack
  • Connect to Gnip’s streaming API, and relay data to DataSift’s API – We will provide an open-source application to help you do so
  • DataSift will transform data to it’s current Twitter schema (including interaction schema)
  • Use DataSift’s augmentation, filtering, classification features as before
  • Deliver data to your application using one of our push connectors

What Gaps Does This Fill?
In a previous post we outlined the gaps left by transitioning to Gnip. Here’s the same list, but updated to show which gaps are filled by bringing data from Gnip into our platform.

Historic Data Access
You will also be able to process historic data using this solution. You can access Gnip’s historic archive which provides dumps of historic data for each time period, then you will be able to push this data into the DataSift platform and process the data using the same augmentation, filtering and classification features.

When Will the Solution be Available?
No doubt you’re already assessing the impact of the transition to your product.

We will be making an evaluation version of this solution available from the middle of May. This will allow you to test ingestion of data from Gnip, demonstrating:

  • Data ingestion – ingestion of data from Gnip
  • Schema normalization – translation from Gnip’s schema to DataSift’s schema
  • Augmentations – continue to use our augmentations
  • Filtering & classification – continue to use your current filters
  • Delivery – continue to deliver your destinations

The evaluation version will be a rate-limited but fully-featured solution and will allow you to assess this potential route for your transition.

Keep Up-to-Date
We’re sorry that you will need to make the move to Gnip to consume Twitter data, but we’ll try to help make this transition as painless as possible. We’ll update you as soon as possible on when we can provide this solution to our customers.

We recommend you start to explore working with the Gnip API and creating your first filters, as whichever route you choose you will need to source your data from Gnip. Please contact your account representative to explore processing Twitter data on the DataSift platform or email us.


Share This