Difference between revisions of "Twitter"

From BaseX Documentation
Jump to navigation Jump to search
Line 2: Line 2:
 
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.
 
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.
  
This article is about the use of BaseX for processing and storing the live stream into databases. We illustrate some statistics about the Twitter data and the performance of BaseX.
+
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.
  
 
=Twitters' Streaming Data=
 
=Twitters' Streaming Data=

Revision as of 10:15, 25 May 2012

As Twitter attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for all kind of analytics. Twitter provides the developer community with a set of APIs for retrieving the data about its users and their communication, including the Streaming API for data-intensive applications, the Search API for querying and filtering the messaging content, and the REST API for accessing the core primitives of the Twitter platform.

This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.

Twitters' Streaming Data

The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.

Tweets.png

Statistics about the data:

Example Tweet (JSON):

Example Tweet (XML):

<repositories>
  <repository>
    <id>basex</id>
    <name>BaseX Maven Repository</name>
    <url>http://files.basex.org/maven</url>
  </repository>
</repositories>

BaseX Performance