Building the Future

Nick Halstead
4th August 2011 1 Comment

I wanted to share something about how we view the future of the data ecosystem. We believe building an ecosystem of applications on top of our platform will radically change the data market and create a marketplace for businesses and developers alike.

We have spent four years learning about scale and cost to companies to build the kind of infrastructures required to deal with the data volumes involved with something like the Twitter Firehose. Not everyone may know but we also run a little website called TweetMeme – it deals with 500 million API requests and consumes 6Tb of bandwidth every day. With DataSift we have built something that is truly awe-inspiring in power and flexibility. We think in scales of millions of simultaneous streams not hundreds. We think of data processing that involves millions of complex decisions per stream, not just a few simple keywords. The future of data processing needs the power of the cloud and we ARE the cloud.

For those interested this is a diagram of our platform at a very high level –
Diagram of our platform at a very high level

We have also spent the last six months testing our platform with corporates – Fortune 500, Retail and Media, Financial Services, Travel and Education we have shown we have a scalable, flexible platform that meets whatever demands they have. We are new to the market but we know when they take the Pepsi challenge we always win.

It is easy to miss-understand our focus on developers but the reality is that behind every corporate is a team of developers. And we believe developers can change the world – give them the tools and they will build the future. Data aggregation and licensing is a huge technical challenge and thousands of companies waste massive resources re-inventing the wheel to build their own. But traditional models dictate that the barrier to entry is way too high for most companies, we break the mold giving on-demand access to a single tweet or a billion.

What makes us different?

  • Track Tweets from every person who follows Lady Gaga who also follows Barack Obama
  • Track 100,000+ Geo Locations simultaneously
  • Gender Detection, Political, Interest and Authority Segmentation
  • Pattern matching (via regular expressions) – like looking for every ISBN mentioned on Twitter
  • Real-time Sentiment Analysis and Natural Language processing (entity extraction)
  • Detect over 30 languages (in real-time)
  • Record every Tweet into what we call ‘BigStore’ for later retrieval or Map-Reduce

We look forward to inviting you all into DataSift very soon and building the future.

  • Anonymous

    I guess the powerful realtime filtering capabilities you provide is worth any platform schema to understand that gnip is far from competing with your platform. 🙂

Share This