Ruby Wrapper for DataSift

22nd December 2010 | 0 Comments

Are you a developer who is currently investigating DataSift and our offerings, well if you are and you use Ruby or are after a wrapper for our API then you might just find this little project on GitHub by Steven Shingler of interest.  We thought it was a great idea to bring this to your attention so that you don’t need to completely write your own from scratch!

If you have created any source code that you feel others could use in their projects feel free to share your code on GitHub or similar and let us know!  We’ll start adding the links to these into our documentation area for all to find nice and easily!

All our documentation can be found in our Knowledge Base!

Building on top of other Streams

10th December 2010 | 6 Comments

DataSift has the ability to enable any user to build upon a stream that you or another user has created.  The concept is pretty simple, each stream has a unique identifier which you will find when you select the stream.  In order to find the identity of the stream click on Stream Identity, and copy it! Then start creating your new stream definition!  This short video shows you the syntax and where to find the useful things that you might be looking for to create complex streams.


Download Fullscreen Video

We also did a little video to take you through our support site so that you can find all the documentation, see where bug reports go and also where your questions get answered and feedback can be found.


Download Fullscreen Video

If you liked this post you might also like this nice overview of DataSift usage scenarios by L. Mohan Arun.

Language Detection has had an Update

9th December 2010 | 0 Comments

We have updated our language detection service within DataSift, as we found that our previous version was unable to identify the language of interactions as reliably and efficiently as we had originally hoped it would.  If you’re wondering where to find it, go to Create Stream or Edit Stream and it’s in the CSDL Language Help area at the bottom of the list.

Thanks to the improved efficiency we have added support for ten more languages which can be found in the table below. If there is a language that we do not currently support, please point us towards some sample text of your chosen language (200+KB of a book, or web article’s. however dictionary’s and lists should be avoided) to train the detector on.  You can do this by raising a suggestion here and we will try and include it in our next release.

For those interested in the deep technical detail of what has changed here’s the low down.  Our system still uses an n-gram based approach to language detection but is now using fixed length trigram’s (blocks of 3 characters) instead of variable length n-gram’s. This improves processing efficiency as we no longer loop over the interaction text to generate for each length of n-gram. Instead, we now generate all the trigrams in a single pass. Also we are generating the trigrams that include word boundaries, rather than looking at each word in isolation.

Language language.tag code (ISO 639-1)
Afrikaans af
Bulgarian bg (new)
Czech cs (new)
Danish da
German de
Greek el (new)
English en
Spanish es
Finnish fi
French fr
Hebrew he (new)
Hungarian hu
Icelandic is (new)
Italian it
Japanese ja
Latin la (new)
Dutch nl
Norwegian no
Polish pl (new)
Portuguese pt
Romanian ro (new)
Russian ru (new)
Swedish sv
Tagalog tl (new)
Chinese zh

Getting Started with DataSift

8th December 2010 | 0 Comments

Now that we’ve sent out our first 400 Alpha invites we are starting to put together a series of screencasts. The first of these is our getting started screencast.  This one runs you through the main screens and guides you through creating your first simple stream.  Alongside this screencast we also have written documentation which you can find in our support area

Creating your first stream is always the hardest because you want to make it something interesting not just another test sample! So hopefully this little 5 minute video will help make that a reality!  I’m not going to tell you what we decided to create a stream of for our first stream… for that you’ll have to play the video!


Download Fullscreen Video

We will be creating a range of different screencasts so if there’s something you want an in depth review of or deep dive on then just let us know by leaving a comment here or on LinkedIn, Twitter or Facebook.

DataSift talks about what is Curation?

7th December 2010 | 2 Comments

There has been a lot of conversation on our LinkedIn and Facebook page about what curation is and how it relates to DataSift and our services.  We look at curation here at DatSift in terms of digital curation, which is defined broadly as the selection, preservation, maintenance, collection and archiving of digital data. Here’s what Andy Gott has to say on the subject of Real Time Curation:

Here’s how we do it, we collect and bring together real time streams from Twitter, WordPress.com, SixApart, Google Buzz, MySpace and InfoChimps. (And we will be adding more great real time streams in due course.)  By collecting data from reputable well known real time sources we start with a really interesting selection of data to start sorting against.

On top of the basic curation we also have the ability to curate content based on sentiment through Salience, authority via PeerIndex and influence based on Klout. Each of these services enables a deeper level of understanding of the output data providing just the data that you find useful or of interest.

And on that final note I’ll leave you with a quote from Hugh Macken on curation.

What are your thoughts on this area and how do you think the definition of curation will change in the future to take into account the raft of real time data?  What additional services would you like to see integrated with DataSift?

Please do join in the interesting conversations happening around this area on our LinkedIn Group, Facebook Page and Twitter account!

DataSift to release 100 Alpha invites per day!

6th December 2010 | 0 Comments

Last week we opened DataSift up to our first 100 users and that went really well.  We’ve had some great feedback from our users and have a few areas that we are now focusing on.  As a result we are ready to release our next 100 users.

Rather than just adding another 100 users this week we are going to add 100 new users each day this week!  Yes you read that right! 100 new users per day each day this week!  So if you have signed up to our alpha then please do keep an eye on your inbox for your golden ticket!

How DataSift can be used by Brands

With so many different ways to use DataSift I can imagine figuring out your first stream could be rather daunting.  So where do you start?  Well here’s an idea, what’s you’re favourite brand? What brands do your company represent or do you work for?  Now how do you turn that into something that could be created on DataSift.  Well here goes!

First things first, define what you want to find out more about! In my case I like the soft drink Dr Pepper, in fact so do a lot of our developer team we’re a bit addicted! So I thought why not create a rule that picks up on any big news and announcements about Dr Pepper, any mentions of the brand and any links to articles about Dr Pepper.  Now I know that sounds like I’m now a brand stalker! But wait! You’ll see why this is really cool in a minute.

So now we have the rule created pulling all those items together and we get a mass of tweets, articles and blog posts all about Dr Pepper but to me that’s too noisy! I want to find other Dr Pepper advocates or fans! So I create a second rule using the Lexalytics Salience analysis software and build that on top of the first rule that I created by using it’s rule id.  Now I only want the positive sentiment so anything over 3 is positive (i tried smaller but there was too much neutral content and negative numbers are negative sentiment.)

Now our output looks far more like what I’m after!  From here I can then create my own application, service or tool that could say follow all the people who tweeted the content by adding a rule that took only the Twitter content, or collect just the links through another rule built on top of this and provide those to my Google Reader as an RSS feed. 

It’s my very own curation filter! It does what I tell it!  I can even keep my creations private, but where’s the fun in that! I can’t show off my cool streams to my friends if I do that. But it does mean that businesses who want to do private curation can keep their creations private. So there you have it a shiny new brand monitoring tool just waiting for you to curate til your heart’s content!

If you would like to try out DataSift please sign up to our Alpha Program.