As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform. This article is about the use of BaseX for processing and storing the live [http://twitter.com Twitter] stream into databases. We illustrate some statistics about the Twitter data and the performance of BaseX.
=Twitters' Streaming Data=
The following figure shows the amount of data, that is delivered by the [https://dev.twitter.com/docs/streaming-apis Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.