DataSift has a new Target for getting a random sample. If you want to test a stream with a small samples or find out about a particular keyword in a random sample then you can use the interaction.sample target in your stream definition.
The new target takes an input value in the range of 0.0 to 100.00. The input value is a floating point number and specifies the percentage of data you want to target.
For example:
The above example will target a random 10% of input feeds with the word “Apple”.
We will be very soon making some changes to our core API and I wanted to to explain why we were making them. One of the first things we built for DataSift was the REST API – this was based upon the code from TweetMeme which had served us so well (it still servers 10,000′s of requests per second). We designed the API based upon a lot of assumptions back then and with hindsight they were clearly wrong, we wanted to fix those before we made a full public release.
The premise was to reduce complexity and remove confusion over what API methods to use. The CORE API has been reduced down to 3 methods,
1) Compile
2) Cost
3) Stream
These map to the basic requirements of DataSift – that you first compile your CSDL – second you can optionally lookup the cost breakdown of that code once compiled and third the ability to then stream the data (this can be via REST or our streaming API’s)
We are working hard to update all of the libraries that have been written to support our API and at the same time adding some new languages.
We have been overwhelmed by the interest in DataSift and although it has taken us much longer than expected we believe the end result has been waiting for.
Here is a complete list of the changes,
- Authentication although backwardly compatible with GET parameters now also supports header based authentication – which is the preferred method.
- Dropped (for now) all the methods that allowed altering of the streams stored in the user profile (we will cover reason + replacement in follow up post)
- Rate limiting is now based upon your user account and not your IP address
- Rate limit is now not just based upon ‘no of requests’ – as each method has a different cost (e.g. compiling is a lot more expensive than fetching a stream)
- API calls now have proper status codes returned for errors
- return data no longer includes ‘success’ field (now replaced by status code)