In partnership with Datameer, today we are very pleased to announce a new product: DataSift Insights. DataSift is extremely powerful at extracting content and augmenting that content in real-time, what it cannot do is perform any kind of analysis on that data. Datameer has a platform that brings the power of Hadoop to every user through the use of a spreadsheet like interface which then in turn generates a pipeline of map-reduce tasks.
We have spent the last year working on building out a significant Hadoop storage solution in which we can store customers streams and also record many of our real-time partner feeds including the whole Twitter Firehose. We currently have over 400Tb of storage available for our customers to record data and for us to store the Twitter Firehose. So far that Hadoop cluster has purely been for storage. Now via the power of Datameer we can perform massive computational tasks on those datasets for our customers.
What inspired me the most about the Datameer platform was how the editor works. Given that a typical dataset for DataSift can be in the 100’s of millions of rows, it would of course be unwieldy to work on that volume of data. What Datameer have revolutionised is that they take a small but statistically relevant part of that data and allow you via a spreadsheet style interface to manipulate the data and simulate the results completely in real-time. The spreadsheet supports all the usual functionality like grouping, sorting, filtering, inner joins, outer joins. In total they have over 180 different functions.
Lastly Datameer also has a suite of visualisations built into a dashboard that with a couple of clicks allows you to take results of the map-reduce tasks and quickly visualise them into a whole host of charts.