This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for all kind of analytics. Twitter provides the developer community with a set of [https://devdeveloper.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.
= BaseX as Twitter Storage=
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never -ending tweet stream. As Twitter delivers the tweets as [httphttps://www.json.org/ JSON] objects , the objects has to bedata is converted into XML fragments. For this purpose , the parse function of the [[JSON Module|XQuery JSON Module]] is used. In the examples section both versions are shown ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].
=Twitter’s Streaming Data=
==Example Tweet (XML)==
<pre classsyntaxhighlight lang="brush:xml"><<json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator"
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates"
arrays="urls indices hashtags user__mentions"
objects="json entities user">> <<contributors/>> <<text>>Using BaseX for storing the Twitter Stream<</text>> <<geo/>> <<retweeted>>false<</retweeted>> <<in__reply__to__screen__name/>> <<possibly__sensitive>>false<</possibly__sensitive>> <<truncated>>false<</truncated>> <<entities>> <<urls/>> <<hashtags/>> <<user__mentions/>> <</entities>> <<in__reply__to__status__id__str/>> <<id>>1984009055807*****<</id>> <<in__reply__to__user__id__str/>> <<source><><a href="http://twitterfeed.com" rel="nofollow">>twitterfeed<</a><></source>> <<favorited>>false<</favorited>> <<in__reply__to__status__id/>> <<retweet__count>>0<</retweet__count>> <<created__at>>Fri May 04 13:17:16 +0000 2012<</created__at>> <<in__reply__to__user__id/>> <<possibly__sensitive__editable>>true<</possibly__sensitive__editable>> <<id__str>>1984009055807*****<</id__str>> <<place/>> <<user>> <<location/>> <<default__profile>>true<</default__profile>> <<statuses__count>>9096<</statuses__count>> <<profile__background__tile>>false<</profile__background__tile>> <<lang>>en<</lang>> <<profile__link__color>>0084B4<</profile__link__color>> <<id>>5024566**<</id>> <<following/>> <<protected>>false<</protected>> <<favourites__count>>0<</favourites__count>> <<profile__text__color>>333333<</profile__text__color>> <<contributors__enabled>>false<</contributors__enabled>> <<verified>>false<</verified>> <<description>>http://basex.org<</description>> <<profile__sidebar__border__color>>C0DEED<</profile__sidebar__border__color>> <<name>>BaseX<</name>> <<profile__background__color>>C0DEED<</profile__background__color>> <<created__at>>Sat Feb 25 04:05:30 +0000 2012<</created__at>> <<default__profile__image>>true<</default__profile__image>> <<followers__count>>860<</followers__count>> <<geo__enabled>>false<</geo__enabled>> <<profile__image__url__https>>https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png<</profile__image__url__https>> <<profile__background__image__url>>http://a0.twimg.com/images/themes/theme1/bg.png<</profile__background__image__url>> <<profile__background__image__url__https>>https://si0.twimg.com/images/themes/theme1/bg.png<</profile__background__image__url__https>> <<follow__request__sent/>> <<url>>http://adf.ly/5ktAf<</url>> <<utc__offset/>> <<time__zone/>> <<notifications/>> <<friends__count>>2004<</friends__count>> <<profile__use__background__image>>true<</profile__use__background__image>> <<profile__sidebar__fill__color>>DDEEF6<</profile__sidebar__fill__color>> <<screen__name>>BaseX<</screen__name>> <<id__str>>5024566**<</id__str>> <<show__all__inline__media>>false<</show__all__inline__media>> <<profile__image__url>>http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png<</profile__image__url>> <<is__translator>>false<</is__translator>> <<listed__count>>0<</listed__count>> <</user>> <<coordinates/>><</json>></presyntaxhighlight>
= BaseX Performance =