Introducing Wikistats – what’s trending on Wikipedia right now

20th June 2012 15 Comments

Today we’re excited to announce Wikistats and add Wikipedia to our ever-increasing list of data sources for our social data platform.

Through, DataSift provides a real-time insight into the trending articles on Wikipedia in the last 24 hours. Just as we identified the most popular stories on Twitter when we created Tweetmeme, Wikistats is another great showcase of what’s possible with DataSift’s Social-Data platform. By filtering and analyzing the activity stream of new articles and edits on Wikipedia, we’re able to surface an insight into the top articles and content being created. As well as providing a view into all articles on Wikipedia, we use our NLP (Natural Language Processing) service to categorize articles into popular categories including technology, banking, celebrities, politics, sports, and more.

The importance of Wikipedia to corporates and researchers

There is no denying Wikipedia’s importance as a trusted information source. It is one of the largest user collaborated data sources in the world, one of the highest ranking in any Google search, and the sixth most popular website overall. In May alone, the site received a total of 15,282,000,000 views; that’s 5,900 views per second, which is even higher than the number of tweets per second!

For corporates, a recent study, Brands on Wikipedia by the Numbers, completed by EthicalWiki suggests that Wikipedia is just as important as any other social media network. In it, David King indicates that companies need to begin thinking about Wikipedia as an important part of their content marketing strategy, and begin collaborating with Wikipedia editors. At the very least, King says every company should be updating the talk page with company news so editors can keep the page updated and accurate.

As a company or researcher, tracking mentions of products, companies and people is a major challenge today given the volume of content in Wikipedia. To give some context, each month the Wikipedia site edits total between 11-12 million, and each day an average of approximately 7,100 new pages are added. In comparison, Bloomberg publishes approximately 5,000 articles per day, and the New York Times publishes approximately 1,000 per day. Considering most of these articles are following a story over time, the combined number of new stories amongst two of the world’s largest news organizations would equate to approximately half of what is added to Wikipedia each month.

Wikistats, and the Wikipedia data source we’ve added into our DataSift platform, make tracking all these changes much simpler by giving a glimpse into the hottest topics that are being updated on Wikipedia in real-time.

How Social and Wikipedia come together

One interesting thing to note, is that for the most part, the activity on Wikipedia can be correlated with activities occurring in the social landscape.

For example, as soon as the news broke of Microsoft’s announcement of their tablet computer on Tuesday 19th June at 6:30pm EST, it was only minutes later at 6:33pm EST that the first page edit for Microsoft Surface was created on Wikipedia. When the keynote video was released online, a social media frenzy began, and the Wikipedia article for the Surface quickly began climbing the Wikistats ranks.

By the next day, the Wikipedia article on the Microsoft Surface was the number two ranked article in Google search. Given the high-authority of Wikipedia in any Google search, Wikipedia is an important, trusted source for individuals when looking to learn more about a topic, product or company.

How we identify what’s trending in Wikistats

The Wikipedia activity stream provides us with a torrent of data on new and updated pages. In analyzing this stream, our ranking system is based on an algorithm combining the number of edits, unique editors and lines added and removed, resulting in a ranking out of 100. By clicking on a particular subject, you are taken to a more detailed summary where you can view the edits per hour in combination with the username of the editor and the percent of edits that each has done. This can be useful in identifying how people are collaborating around the creation and update of a Wikipedia article.

Articles that climb the ranks of Wikistats are typically based on news events, research updates, or highly controversial subject matter. From June 19-20 noon-noon the top 10 rankings included Victor Spinetti, who is a Welsh actor who passed away yesterday, Microsoft Surface (tablet), which had the keynote speech on June 18, and UEFA Euro 2012.

Tapping into the Wikipedia Activity Stream with DataSift

In addition to the launch of Wikistats, we’re also adding the Wikipedia activity stream as a data source in our Social Data Platform, enabling companies to create sophisticated filters to easily track mentions of their company or products across all of Wikipedia, enabling them to track what the world’s most trusted information source is saying about them.

This is available today as a free data source for DataSift customers. You can learn more about the contents and structure of the stream here.

And of course you can access Wikipedia content as part of a trail account. Sign up to take it for a test drive!

Share This