https://docs.basex.org/api.php?action=feedcontributions&user=AW&feedformat=atomBaseX Documentation - User contributions [en]2024-03-29T13:23:23ZUser contributionsMediaWiki 1.34.0https://docs.basex.org/index.php?title=Twitter&diff=7788Twitter2012-06-22T12:24:20Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. In the examples section both versions are shown ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]).<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM <br/><br />
BaseX Version: BaseX 7.3 beta<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
The time needed for the inserts includes the conversion time.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
coming soon...</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7787Twitter2012-06-22T12:24:05Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. In the examples section both versions are shown ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]]. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]).<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM <br/><br />
BaseX Version: BaseX 7.3 beta<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
The time needed for the inserts includes the conversion time.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
coming soon...</div>AWhttps://docs.basex.org/index.php?title=Full-Text&diff=7558Full-Text2012-06-06T08:38:40Z<p>AW: </p>
<hr />
<div>This article is part of the [[XQuery|XQuery Portal]].<br />
It summarizes the full-text features of BaseX.<br />
<br />
Full-text retrieval is an essential query feature for working with XML documents, and BaseX was the first query processor that fully supported the [http://www.w3.org/TR/xpath-full-text-10/ W3C XQuery Full Text 1.0] Recommendation. This page lists some singularities and extensions of the BaseX implementation.<br />
<br />
==Query Evaluation== <br />
BaseX offers different evaluation strategies for XQFT queries, the choice of which<br />
depends on the input data and the existence of a full text index. The query compiler tries<br />
to optimize and speed up queries by applying a full text index structure whenever<br />
possible and useful. Three evaluation strategies are available: the standard sequential<br />
database scan, a full-text index based evaluation and a hybrid one, combining both strategies (see [http://www.inf.uni-konstanz.de/gk/pubsys/publishedFiles/GrGaHo09.pdf "XQuery Full Text implementation in BaseX"]). <br />
Query optimization and selection of the most efficient evaluation strategy is done<br />
in a full-fledged automatic manner. The output of the query optimizer indicates which<br />
evaluation plan is chosen for a specific query. It can be inspected by activating verbose<br />
querying (Command: <code>SET VERBOSE ON</code>) or opening the Query Info in the GUI.<br />
The message<br />
<br />
<code>Applying full-text index</code><br />
<br />
suggests that the full-text index is applied to speed up query evaluation.<br />
A second message<br />
<br />
<code>Removing path with no index results</code><br />
<br />
indicates that the index does not yield any results for the specified term and<br />
is thus skipped. If index optimizations are missing, it sometimes helps to give<br />
the compiler a second chance and try different rewritings of the same query.<br />
<br />
==Options==<br />
<br />
The available full-text index can handle various combinations of the match options defined in the XQuery Full Text Recommendation. By default, most options are disabled. The GUI dialogs for creating new databases or displaying the database properties contain a tab for choosing between all available options. On the command-line, the <code>SET</code> command can be used to activate full-text indexing or creating a full-text index for existing databases:<br />
<br />
* <code>SET FTINDEX true; CREATE DB input.xml</code><br />
* <code>CREATE INDEX fulltext</code><br />
<br />
The following indexing options are available: <br />
<br />
* '''Language''': [[#Languages|see below]] for more details (<code>SET LANGUAGE EN</code>).<br />
* '''Stemming''': tokens are stemmed with the Porter Stemmer before being indexed (<code>SET STEMMING true</code>).<br />
* '''Case Sensitive''': tokens are indexed in case-sensitive mode (<code>SET CASESENS true</code>).<br />
* '''Diacritics''': diacritics are indexed as well (<code>SET DIACRITICS true</code>).<br />
* '''Stopword List''': a stop word list can be defined to reduce the number of indexed tokens (<code>SET STOPWORDS [filename]</code>).<br />
* {{Mark|Removed in Version 7.3}}: '''TF/IDF Scoring''': TF/IDF-based scoring values are calculated and stored in the index (<code>SET SCORING 0/1/2</code>). This feature was removed in favor of the internal scoring model; [[#Scoring|see below]] for more details.<br />
* {{Mark|Removed in Version 7.3}}: '''Support Wildcards''': a trie-based index can be applied to support wildcard searches (<code>SET WILDCARDS true</code>). This option was discarded, as the index now supports both wildcard and fuzzy queries.<br />
<br />
==Languages==<br />
<br />
The chosen language determines how the input text will be tokenized and stemmed. The basic code base and <code>jar</code> file of BaseX comes with built-in support for English and German. More languages are supported if the following libraries are found in the classpath:<br />
<br />
* [http://files.basex.org/maven/org/apache/lucene-stemmers/3.4.0/lucene-stemmers-3.4.0.jar lucene-stemmers-3.4.0.jar]: includes Snowball and Lucene stemmers and extends language support to the following languages: Arabic, Bulgarian, Catalan, Czech, Danish, Dutch, Finnish, French, Hindi, Hungarian, Italian, Latvian, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.<br />
<br />
* [http://en.sourceforge.jp/projects/igo/releases/ igo-0.4.3.jar]: [[Full-Text: Japanese|An additional article]] explains how Igo can be integrated, and how Japanese texts are tokenized and stemmed.<br />
<br />
The JAR files can also be found in the <code>zip</code> and <code>exe</code> distribution files of BaseX.<br />
<br />
The following two queries, which both return <code>true</code>, demonstrate that stemming depends on the selected language:<br />
<br />
<pre class="brush:xquery"><br />
"Indexing" contains text "index" using stemming,<br />
"häuser" contains text "haus" using stemming using language "de"<br />
</pre><br />
<br />
==Scoring== <br />
<br />
The XQuery Full Text Recommendation allows for the usage of scoring models<br />
and values within queries, with scoring being completely implementation defined.<br />
BaseX offers an internal scoring model which can be extended to<br />
different application scenarios. <br />
<br />
{{Mark|Updated in Version 7.3:}}<br />
TF/IDF scoring was discarded in favor of the internal scoring model, which proved to yield better results for XML documents in most of the cases. The score of a full-text result is calculated by taking the number of found terms and their frequency in a single text node into account. Terms will be ranked higher if they are found in short texts.<br />
<br />
==Thesaurus== <br />
<br />
BaseX supports full-text queries using thesauri, but it does not provide a default thesaurus. This is why query such as<br />
<br />
<pre class="brush:xquery"><br />
'computers' contains text 'hardware'<br />
using thesaurus default<br />
</pre><br />
<br />
will return <code>false</code>. However, if the thesaurus is specified, then the result will be <code>true</code><br />
<br />
<pre class="brush:xquery"><br />
'computers' contains text 'hardware'<br />
using thesaurus at 'XQFTTS_1_0_4/TestSources/usability2.xml'<br />
</pre><br />
<br />
The format of the thesaurus files must be the same as the format of the thesauri provided by the [http://dev.w3.org/2007/xpath-full-text-10-test-suite XQuery and XPath Full Text 1.0 Test Suite]. It is an XML with structure defined by an [http://dev.w3.org/cvsweb/~checkout~/2007/xpath-full-text-10-test-suite/TestSuiteStagingArea/TestSources/thesaurus.xsd?rev=1.3;content-type=application%2Fxml XSD Schema].<br />
<br />
==Fuzzy Querying==<br />
In addition to the official recommendation, BaseX supports fuzzy querying.<br />
The XQFT grammar was enhanced by the FTMatchOption <code>using fuzzy </code> <br />
to allow for approximate searches in full texts.<br />
By default, the standard [[indexes|full-text index]] already supports the efficient<br />
execution of fuzzy searches.<br />
<br />
'''Document 'doc.xml'''':<br />
<pre class="brush:xml"><br />
<doc><br />
<a>house</a><br />
<a>hous</a><br />
<a>haus</a><br />
</doc><br />
</pre> <br />
'''Command:''' <code>CREATE DB doc.xml; CREATE INDEX fullext</code> <br />
<br />
'''Query:'''<br />
<pre class="brush:xquery"><br />
//a[text() contains text 'house' using fuzzy]<br />
</pre><br />
<br />
'''Result:'''<br />
<pre class="brush:xml"><br />
<a>house</a><br />
<a>hous</a><br />
</pre><br />
<br />
Fuzzy search is based on the Levenshtein distance. The maximum number of allowed<br />
errors is calculated by dividing the token length of a specified query term by 4,<br />
preserving a minimum of 1 errors. A static error distance can be set by adjusting<br />
the <code>[[Options#LSERROR|LSERROR]]</code> property (default: <code>SET LSERROR 0</code>).<br />
The query above yields two results as there is no error between the query term<br />
“house” and the text node “house”, and one error between<br />
“house” and “hous”.<br />
<br />
==Mixed Content==<br />
<br />
When working with so-called narrative XML documents, such as HTML, [http://tei-c.org/ TEI], or [http://docbook.org DocBook] documents, you typically have ''mixed content'', i.e., elements containing a mix of text and markup, such as:<br />
<br />
<pre class="brush:xml"><br />
<p>This is only an illustrative <hi>example</hi>, not a <q>real</q> text.</p><br />
</pre><br />
<br />
Since the logical flow of the text is not interrupted by the child elements, you will typically want to search across elements, so that the above paragraph would match a search for “real text”. For more examples, see [http://www.w3.org/TR/xpath-full-text-10-use-cases/#Across XQuery and XPath Full Text 1.0 Use Cases].<br />
<br />
To enable this kind of searches, ''whitespace chopping'' must be turned off when importing XML documents by setting the option <code>[[Options#CHOP|CHOP]]</code> to <code>OFF</code> (default: <code>SET CHOP ON</code>). In the GUI, you find this option in Database → New… → Parsing → Chop Whitespaces. A query such as <code>//p[. contains text 'real text']</code> will then match the example paragraph above.<br />
<br />
Note that the node structure is completely ignored by the full-text tokenizer: The {{Code|contains text}} expression applies all full-text operations to the ''string value'' of its left operand. As a consequence, the <code>ft:mark</code> and <code>ft:extract</code> functions (see [[Full-Text Module|Full-Text Functions]]) will only yield useful results if they are applied to single text nodes, as the following example demonstrates:<br />
<br />
<pre class="brush:xquery"><br />
(: Structure is ignored; no highlighting: :)<br />
ft:mark(//p[. contains text 'real'])<br />
(: Single text nodes are addressed: results will be highlighted: :)<br />
ft:mark(//p[.//text() contains text 'real'])<br />
</pre><br />
<br />
Note that BaseX does '''not''' support the ''ignore option'' (<code>without content</code>) of the [http://www.w3.org/TR/xpath-full-text-10/#ftignoreoption W3C XQuery Full Text 1.0] Recommendation. This means that it is not possible to ignore descendant element content, such as footnotes or other material that does not belong to the same logical text flow. Here is an example document:<br />
<br />
<pre class="brush:xml"><br />
<p>This text is provided for illustrative<note>Serving as an example or explanation.</note> purposes only.</p><br />
</pre><br />
<br />
The ignore option would enable you to search for the string “illustrative purposes”:<br />
<br />
<pre class="brush:xquery"><br />
//p[. contains text 'illustrative purposes' without content note]<br />
</pre><br />
<br />
For more examples, see [http://www.w3.org/TR/xpath-full-text-10-use-cases/#Ignore XQuery and XPath Full Text 1.0 Use Cases].<br />
<br />
As BaseX does not support the ignore option, it raises error [[XQuery_Errors#Full-Text_Errors|FTST0007]] when it encounters <code>without content</code> in a full-text <code>contains</code> expression.<br />
<br />
==Functions==<br />
<br />
Some additional [[Full-Text Module|Full-Text Functions]] have been added to BaseX to extend the official language recommendation with useful features, such as explicitly requesting the score value of an item, marking the hits of a full-text request, or directly accessing the full-text index with the default index options.<br />
<br />
=Changelog=<br />
<br />
; Version 7.3:<br />
<br />
* Removed: The trie index, which was specialized on wildcard queries, was removed. The fuzzy index now supports both wildcard and fuzzy queries.<br />
* Removed: TF/IDF scoring was discarded in favor of the internal scoring model.<br />
<br />
[[Category:XQuery]]</div>AWhttps://docs.basex.org/index.php?title=Full-Text_Module&diff=7556Full-Text Module2012-06-05T13:17:15Z<p>AW: </p>
<hr />
<div>This [[Module Library|XQuery Module]] extends the [http://www.w3.org/TR/xpath-full-text-10 W3C Full Text Recommendation] with some useful functions: The index can be directly accessed, full-text results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the {{Code|contains text}} expression, can be explicitly requested from items.<br />
<br />
=Conventions=<br />
<br />
All functions in this module are assigned to the {{Code|http://basex.org/modules/ft}} namespace, which is statically bound to the {{Code|ft}} prefix.<br/><br />
All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.<br />
<br />
=Functions=<br />
<br />
==ft:search==<br />
<br />
{{Mark|Updated with Version 7.3:}} second argument generalized, third parameter added.<br />
<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:search|$db as item(), $terms as item()*|text()*}}<br/>{{Func|ft:search|$db as item(), $terms as item()*, $options as item()|text()*}}<br />
|-<br />
| '''Summary'''<br />
|Returns all text nodes from the full-text index of the [[Database Module#Database Nodes|database node]] {{Code|$db}} that contain the specified {{Code|$terms}}.<br/>The options used for building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well.<br />
The {{Code|$options}} argument can be used to overwrite the default full-text options, which can be either specified<br/><br />
* as children of an {{Code|&lt;options/&gt;}} element, e.g.:<br />
<pre class="brush:xml"><br />
<options><br />
<key1 value='value1'/><br />
...<br />
</options><br />
</pre><br />
* as map, which contains all key/value pairs:<br />
<pre class="brush:xml"><br />
map { "key1" := "value1", ... }<br />
</pre><br />
The following keys are supported:<br />
* {{Code|mode}}: determines the search mode (also called [http://www.w3.org/TR/xpath-full-text-10/#ftwords AnyAllOption]). Allowed values are {{Code|any}}, {{Code|any word}}, {{Code|all}}, {{Code|all words}}, and {{Code|phrase}}. {{Code|any}} is the default search mode.<br />
* {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are an empty string or {{Code|true}}, or {{Code|false}}. By default, fuzzy querying is turned off.<br />
* {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are an empty string or {{Code|true}}, or {{Code|false}}. By default, wildcard querying is turned off.<br />
|-<br />
| '''Errors'''<br />
|{{Error|BXDB0004|Database Module#Errors}} the full-text index is not available.<br/>{{Error|BXFT0001|#Errors}} both fuzzy and wildcard querying was selected.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:search("DB", "QUERY")}} returns all text nodes of the database {{Code|DB}} that contain the term {{Code|QUERY}}.<br />
* <code>ft:search("DB", ("2010","2011"), map { 'mode':='all' })</code><br/>returns all text nodes of the database {{Code|DB}} that contain the numbers {{Code|2010}} and {{Code|2011}}.<br />
* The last example iterates over five databases and returns all elements containing terms similar to {{Code|Hello World}} in the text nodes:<br />
<pre class="brush:xquery"><br />
let $terms := "Hello Worlds"<br />
let $fuzzy := true()<br />
let $options :=<br />
<options><br />
<fuzzy>{ $fuzzy }</fuzzy><br />
</options><br />
for $db in 1 to 3<br />
let $dbname := 'DB' || $db<br />
return ft:search($dbname, $terms, $options)/..<br />
</pre><br />
|}<br />
<br />
==ft:mark==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:mark|$nodes as node()*|node()*}}<br />{{Func|ft:mark|$nodes as node()*, $tag as xs:string|node()*}}<br />
|-<br />
| '''Summary'''<br />
|Puts a marker element around the resulting {{Code|$nodes}} of a full-text index request.<br />The default tag name of the marker element is {{Code|mark}}. An alternative tag name can be chosen via the optional {{Code|$tag}} argument.<br />Note that the XML node to be transformed must be an internal "database" node. The {{Code|transform}} expression can be used to apply the method to a main-memory fragment (see example).<br />
|-<br />
| '''Examples'''<br />
|<br />
* The following query returns {{Code|&lt;XML&gt;&lt;mark&gt;hello&lt;/mark&gt; world&lt;/XML&gt;}}, if one text node of the database {{Code|DB}} has the value "hello world":<br />
<pre class="brush:xquery"><br />
ft:mark(db:open('DB')//*[text() contains text 'hello'])<br />
</pre><br />
* The following expression returns {{Code|&lt;p&gt;&lt;b&gt;word&lt;/b&gt;&lt;/p&gt;}}:<br />
<pre class="brush:xquery"><br />
copy $p := &lt;p&gt;word&lt;/p&gt;<br />
modify ()<br />
return ft:mark($p[text() contains text 'word'], 'b')</pre><br />
|}<br />
<br />
==ft:extract==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:extract|$nodes as node()*|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $tag as xs:string|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $tag as xs:string, $length as xs:integer|node()*}}<br />
|-<br />
| '''Summary'''<br />
|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting {{Code|$nodes}} of a full-text index request and chops irrelevant sections of the result.<br />The default tag name of the marker element is {{Code|mark}}. An alternative tag name can be chosen via the optional {{Code|$tag}} argument.<br />The default length of the returned text is {{Code|150}} characters. An alternative length can be specified via the optional {{Code|$length}} argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.<br />
|-<br />
| '''Examples'''<br />
|<br />
* The following query may return {{Code|&lt;XML&gt;...&lt;b&gt;hello&lt;/b&gt;...&lt;XML&gt;}} if a text node of the database {{Code|DB}} contains the string "hello world":<br />
<pre class="brush:xquery"><br />
ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1)<br />
</pre><br />
|}<br />
<br />
==ft:count==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:count|$nodes as node()*|xs:integer}}<br />
|-<br />
| '''Summary'''<br />
|Returns the number of occurrences of the search terms specified in a full-text expression.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:count(//*[text() contains text 'QUERY'])}} returns the {{Code|xs:integer}} value {{Code|2}} if a document contains two occurrences of the string "QUERY".<br />
|}<br />
<br />
==ft:score==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:score|$item as item()*|xs:double*}}<br />
|-<br />
| '''Summary'''<br />
|Returns the score values (0.0 - 1.0) that have been attached to the specified items. {{Code|0}} is returned a value if no score was attached.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:score('a' contains text 'a')}} returns the {{Code|xs:double}} value {{Code|1}}.<br />
|}<br />
<br />
==ft:tokens==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:tokens|$db as item()|element(value)*}}<br/>{{Func|ft:tokens|$db as item(), $prefix as xs:string|element(value)*}}<br />
|-<br />
| '''Summary'''<br />
|Returns all full-text tokens stored in the index of the [[Database Module#Database Nodes|database node]] {{Code|$db}}, along with their numbers of occurrences.<br/>If {{Code|$prefix}} is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index.<br />
|-<br />
| '''Errors'''<br />
|{{Error|BXDB0004|Database Module#Errors}} the full-text index is not available.<br />
|}<br />
<br />
==ft:tokenize==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:tokenize|$input as xs:string|xs:string*}}<br />
|-<br />
| '''Summary'''<br />
|Tokenizes the given {{Code|$input}} string, using the current default full-text options.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:tokenize("No Doubt")}} returns the two strings {{Code|no}} and {{Code|doubt}}.<br />
* {{Code|declare ft-option using stemming; ft:tokenize("GIFTS")}} returns a single string {{Code|gift}}.<br />
|}<br />
<br />
=Errors=<br />
<br />
{| class="wikitable" width="100%"<br />
! width="5%"|Code<br />
! width="95%"|Description<br />
|-<br />
|{{Code|BXFT0001}}<br />
|Both wildcards and fuzzy search have been specified as search options.<br />
|}<br />
<br />
=Changelog=<br />
<br />
;Version 7.2<br />
<br />
* Updated: [[#ft:search|ft:search]] (second argument generalized, third parameter added)<br />
<br />
;Version 7.1<br />
<br />
* Added: [[#ft:tokens|ft:tokens]], [[#ft:tokenize|ft:tokenize]]<br />
<br />
[[Category:XQuery]]</div>AWhttps://docs.basex.org/index.php?title=Full-Text_Module&diff=7555Full-Text Module2012-06-05T13:15:56Z<p>AW: </p>
<hr />
<div>This [[Module Library|XQuery Module]] extends the [http://www.w3.org/TR/xpath-full-text-10 W3C Full Text Recommendation] with some useful functions: The index can be directly accessed, full-text results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the {{Code|contains text}} expression, can be explicitly requested from items.<br />
<br />
=Conventions=<br />
<br />
All functions in this module are assigned to the {{Code|http://basex.org/modules/ft}} namespace, which is statically bound to the {{Code|ft}} prefix.<br/><br />
All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.<br />
<br />
=Functions=<br />
<br />
==ft:search==<br />
<br />
{{Mark|Updated with Version 7.3:}} second argument generalized, third parameter added.<br />
<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:search|$db as item(), $terms as item()*|text()*}}<br/>{{Func|ft:search|$db as item(), $terms as item()*, $options as item()|text()*}}<br />
|-<br />
| '''Summary'''<br />
|Returns all text nodes from the full-text index of the [[Database Module#Database Nodes|database node]] {{Code|$db}} that contain the specified {{Code|$terms}}.<br/>The options used for building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well.<br />
The {{Code|$options}} argument can be used to overwrite the default full-text options, which can be either specified<br/><br />
* as children of an {{Code|&lt;options/&gt;}} element, e.g.:<br />
<pre class="brush:xml"><br />
<options><br />
<key1 value='value1'/><br />
...<br />
</options><br />
</pre><br />
* as map, which contains all key/value pairs:<br />
<pre class="brush:xml"><br />
map { "key1" := "value1", ... }<br />
</pre><br />
The following keys are supported:<br />
* {{Code|mode}}: determines the search mode (also called [http://www.w3.org/TR/xpath-full-text-10/#ftwords AnyAllOption]). Allowed values are {{Code|any}}, {{Code|any word}}, {{Code|all}}, {{Code|all words}}, and {{Code|phrase}}. {{Code|any}} is the default search mode.<br />
* {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are an empty string or {{Code|true}}, or {{Code|false}}. By default, fuzzy querying is turned off.<br />
* {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are an empty string or {{Code|true}}, or {{Code|false}}. By default, wildcard querying is turned off.<br />
|-<br />
| '''Errors'''<br />
|{{Error|BXDB0004|Database Module#Errors}} the full-text index is not available.<br/>{{Error|BXFT0001|#Errors}} both fuzzy and wildcard querying was selected.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:search("DB", "QUERY")}} returns all text nodes of the database {{Code|DB}} that contain the term {{Code|QUERY}}.<br />
* <code>ft:search("DB", (2010,2011), map { 'mode':='all' })</code><br/>returns all text nodes of the database {{Code|DB}} that contain the numbers {{Code|2010}} and {{Code|2011}}.<br />
* The last example iterates over five databases and returns all elements containing terms similar to {{Code|Hello World}} in the text nodes:<br />
<pre class="brush:xquery"><br />
let $terms := "Hello Worlds"<br />
let $fuzzy := true()<br />
let $options :=<br />
<options><br />
<fuzzy>{ $fuzzy }</fuzzy><br />
</options><br />
for $db in 1 to 3<br />
let $dbname := 'DB' || $db<br />
return ft:search($dbname, $terms, $options)/..<br />
</pre><br />
|}<br />
<br />
==ft:mark==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:mark|$nodes as node()*|node()*}}<br />{{Func|ft:mark|$nodes as node()*, $tag as xs:string|node()*}}<br />
|-<br />
| '''Summary'''<br />
|Puts a marker element around the resulting {{Code|$nodes}} of a full-text index request.<br />The default tag name of the marker element is {{Code|mark}}. An alternative tag name can be chosen via the optional {{Code|$tag}} argument.<br />Note that the XML node to be transformed must be an internal "database" node. The {{Code|transform}} expression can be used to apply the method to a main-memory fragment (see example).<br />
|-<br />
| '''Examples'''<br />
|<br />
* The following query returns {{Code|&lt;XML&gt;&lt;mark&gt;hello&lt;/mark&gt; world&lt;/XML&gt;}}, if one text node of the database {{Code|DB}} has the value "hello world":<br />
<pre class="brush:xquery"><br />
ft:mark(db:open('DB')//*[text() contains text 'hello'])<br />
</pre><br />
* The following expression returns {{Code|&lt;p&gt;&lt;b&gt;word&lt;/b&gt;&lt;/p&gt;}}:<br />
<pre class="brush:xquery"><br />
copy $p := &lt;p&gt;word&lt;/p&gt;<br />
modify ()<br />
return ft:mark($p[text() contains text 'word'], 'b')</pre><br />
|}<br />
<br />
==ft:extract==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:extract|$nodes as node()*|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $tag as xs:string|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $tag as xs:string, $length as xs:integer|node()*}}<br />
|-<br />
| '''Summary'''<br />
|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting {{Code|$nodes}} of a full-text index request and chops irrelevant sections of the result.<br />The default tag name of the marker element is {{Code|mark}}. An alternative tag name can be chosen via the optional {{Code|$tag}} argument.<br />The default length of the returned text is {{Code|150}} characters. An alternative length can be specified via the optional {{Code|$length}} argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.<br />
|-<br />
| '''Examples'''<br />
|<br />
* The following query may return {{Code|&lt;XML&gt;...&lt;b&gt;hello&lt;/b&gt;...&lt;XML&gt;}} if a text node of the database {{Code|DB}} contains the string "hello world":<br />
<pre class="brush:xquery"><br />
ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1)<br />
</pre><br />
|}<br />
<br />
==ft:count==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:count|$nodes as node()*|xs:integer}}<br />
|-<br />
| '''Summary'''<br />
|Returns the number of occurrences of the search terms specified in a full-text expression.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:count(//*[text() contains text 'QUERY'])}} returns the {{Code|xs:integer}} value {{Code|2}} if a document contains two occurrences of the string "QUERY".<br />
|}<br />
<br />
==ft:score==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:score|$item as item()*|xs:double*}}<br />
|-<br />
| '''Summary'''<br />
|Returns the score values (0.0 - 1.0) that have been attached to the specified items. {{Code|0}} is returned a value if no score was attached.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:score('a' contains text 'a')}} returns the {{Code|xs:double}} value {{Code|1}}.<br />
|}<br />
<br />
==ft:tokens==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:tokens|$db as item()|element(value)*}}<br/>{{Func|ft:tokens|$db as item(), $prefix as xs:string|element(value)*}}<br />
|-<br />
| '''Summary'''<br />
|Returns all full-text tokens stored in the index of the [[Database Module#Database Nodes|database node]] {{Code|$db}}, along with their numbers of occurrences.<br/>If {{Code|$prefix}} is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index.<br />
|-<br />
| '''Errors'''<br />
|{{Error|BXDB0004|Database Module#Errors}} the full-text index is not available.<br />
|}<br />
<br />
==ft:tokenize==<br />
{|<br />
|-<br />
| width='90' | '''Signatures'''<br />
|{{Func|ft:tokenize|$input as xs:string|xs:string*}}<br />
|-<br />
| '''Summary'''<br />
|Tokenizes the given {{Code|$input}} string, using the current default full-text options.<br />
|-<br />
| '''Examples'''<br />
|<br />
* {{Code|ft:tokenize("No Doubt")}} returns the two strings {{Code|no}} and {{Code|doubt}}.<br />
* {{Code|declare ft-option using stemming; ft:tokenize("GIFTS")}} returns a single string {{Code|gift}}.<br />
|}<br />
<br />
=Errors=<br />
<br />
{| class="wikitable" width="100%"<br />
! width="5%"|Code<br />
! width="95%"|Description<br />
|-<br />
|{{Code|BXFT0001}}<br />
|Both wildcards and fuzzy search have been specified as search options.<br />
|}<br />
<br />
=Changelog=<br />
<br />
;Version 7.2<br />
<br />
* Updated: [[#ft:search|ft:search]] (second argument generalized, third parameter added)<br />
<br />
;Version 7.1<br />
<br />
* Added: [[#ft:tokens|ft:tokens]], [[#ft:tokenize|ft:tokenize]]<br />
<br />
[[Category:XQuery]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7418Twitter2012-05-30T08:29:59Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM <br/><br />
BaseX Version: BaseX 7.3 beta<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
The time needed for the inserts includes the conversion time.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
coming soon...</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7417Twitter2012-05-30T08:29:46Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
BaseX Version: BaseX 7.3 beta<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
The time needed for the inserts includes the conversion time.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
coming soon...</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7416Twitter2012-05-30T08:29:02Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
The time needed for the inserts includes the conversion time.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
coming soon...</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7415Twitter2012-05-30T08:28:19Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
coming soon...</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7414Twitter2012-05-30T08:26:27Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6997 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 10452 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7413Twitter2012-05-30T08:24:50Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 6792 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7412Twitter2012-05-30T08:00:47Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3396 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 10187 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 13583 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7411Twitter2012-05-30T07:48:50Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| 3996 MB<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| 10187 MB<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| 13583 MB<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7410Twitter2012-05-30T07:04:07Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| <br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| <br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| <br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7409Twitter2012-05-30T07:03:50Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| <br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| <br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| <br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7408Twitter2012-05-30T07:03:39Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size (without indexes)<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| <br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| <br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| <br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7407Twitter2012-05-30T07:03:10Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
! Database Size<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| <br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| <br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| <br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7406Twitter2012-05-30T07:02:43Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
| Database Size<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br />
| <br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br />
| <br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br />
| <br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]<br />
<br />
=== Bulk Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7405Twitter2012-05-30T07:00:59Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7404Twitter2012-05-30T07:00:27Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data provided by the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
== Insert with XQuery Update ==<br />
<br />
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.<br />
<br />
=== Single Updates ===<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7402Twitter2012-05-29T14:48:21Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
The first test shows the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data provided by the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
! Time in minutes<br />
|-<br />
| 1.000.000<br />
| 492.26346<br />
| 8.2<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br />
| 7.6<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br />
| 7.8<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7401Twitter2012-05-29T14:42:04Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
The first test shows the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data provided by the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
{| class="wikitable" width="30%"<br />
|-<br />
! Amount of tweets<br />
! Time in seconds<br />
|-<br />
| 1.000.000<br />
| 492.26346<br/><br />
|-<br />
|-<br />
| 2.000.000<br />
| 461.87326<br/><br />
|-<br />
|-<br />
| 3.000.000<br />
| 470.7054<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7400Twitter2012-05-29T14:39:02Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
The first test shows the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data provided by the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|}<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7399Twitter2012-05-29T14:37:48Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM<br />
<br />
The first test shows the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data provided by the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7398Twitter2012-05-29T14:29:44Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The first test shows the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data provided by the tweet object.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7397Twitter2012-05-29T14:28:20Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The first test shows the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up<br />
with the incoming amount of tweets in the stream.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7396Twitter2012-05-29T14:26:36Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The first test shows the time BaseX needs to insert large amounts of tweets into a database.<br /><br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7395Twitter2012-05-29T14:26:22Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
= BaseX Performance =<br />
<br />
The first test shows the time BaseX needs to insert large amounts of tweets into a database.<br />
Note: The <code>[[Options#AUTOFLUSH|AUTOFLUSH]]</code> option is set to <code>FALSE</code> (default: <code>SET AUTOFLUSH TRUE</code>)<br />
<br />
[[File:insertTweets.png]]</div>AWhttps://docs.basex.org/index.php?title=File:InsertTweets.png&diff=7394File:InsertTweets.png2012-05-29T14:23:21Z<p>AW: </p>
<hr />
<div></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7393Twitter2012-05-29T13:47:54Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7392Twitter2012-05-29T13:47:48Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7391Twitter2012-05-29T13:47:32Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. It is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7390Twitter2012-05-29T13:46:42Z<p>AW: </p>
<hr />
<div>This article is part of the [[Advanced User's Guide]]. As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Advanced_User%27s_Guide&diff=7015Advanced User's Guide2012-05-25T14:05:31Z<p>AW: </p>
<hr />
<div>This page is one of the [[Main Page|Main Sections]] of the documentation.<br />
It contains details on the BaseX storage and the Server architecture, and<br />
presents some more GUI features.<br />
<br />
<div style="float:left; width:48%;"><br />
===Storage===<br />
* [[Configuration]]: BaseX start files and directories<br />
* [[Indexes]]: Available index structures and their utilization<br />
* [[Backups]]: Backup and restore databases<br />
* [[Catalog Resolver]] Information on entity resolving<br />
* [[Storage Layout]]: How data is stored in the database files<br />
<br />
===Use Cases===<br />
* [[Statistics]]: Exemplary statistics on databases created with BaseX<br />
* [[Twitter]]: Storing live tweets in BaseX<br/>&nbsp;<br />
</div><div style="float:left; width:4%;">&nbsp;<br />
</div><div style="float:left; width:48%;"><br />
<br />
===Server and Query Architecture===<br />
* [[User Management]]: User management in the client/server environment<br />
* [[Transaction Management]]: Insight into the BaseX transaction management<br />
* [[Logging]]: Description of the server logs<br />
* [[Events]]: Description of the event feature<br />
* [[Execution Plan]]: Analyzing query evaluation<br />
</div><br />
<div>&nbsp;</div><br />
[[Category:Server]] [[Category:GUI]]<br />
<br />
__NOTOC__</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7009Twitter2012-05-25T13:47:37Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|-<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7008Twitter2012-05-25T13:46:28Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/basex.org",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://basex.org&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7007Twitter2012-05-25T13:44:06Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre></div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7006Twitter2012-05-25T13:43:14Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7005Twitter2012-05-25T13:42:59Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .<br />
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
==Statistics==<br />
<br />
[[File:Tweets.png]]<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Day<br />
! Description<br />
! Amount<br />
|-<br />
| Mon, 6-Feb-2012<br />
| Total tweets<br />
| 30.824.976<br/><br />
|-<br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.284.374<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 21.406<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 356<br/><br />
|-<br />
| Tue, 6-Mar-2012<br />
| Total tweets<br />
| 31.823.776<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.325.990<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 22.099<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 368<br/><br />
|-<br />
| Fri, 6-Apr-2012<br />
| Total tweets<br />
| 34.638.976 million<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.443.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.054<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 400<br/><br />
|-<br />
| Sun, 6-May-2012<br />
| Total tweets<br />
| 35.982.976 million<br/><br />
|-<br />
| <br />
| Average tweets per hour<br />
| 1.499.290<br/><br />
|-<br />
| <br />
| Average tweets per minute<br />
| 24.988<br/><br />
|-<br />
| <br />
| Average tweets per second<br />
| 416<br/><br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7004Twitter2012-05-25T10:46:23Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. The examples [[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show that each tweet is streamed as an object containing the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
==Statistics==<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Type<br />
! Description<br />
! Example (native → hex integers)<br />
|-<br />
| {{Type|Num}}<br />
| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/Num.java Num.java]<br />
| {{Mono|15}} → {{Mono|0F}}; {{Mono|511}} → {{Mono|41 FF}}<br/><br />
|-<br />
| {{Type|Token}}<br />
| Length ({{Type|Num}}) and bytes of UTF8 byte representation<br />
| {{Mono|Hello}} → {{Mono|05 48 65 6c 6c 6f}}<br />
|-<br />
| {{Type|Double}}<br />
| Number, stored as token<br />
| {{Mono|123}} → {{Mono|03 31 32 33}}<br />
|-<br />
| {{Type|Boolean}}<br />
| Boolean (1 byte, {{Mono|00}} or {{Mono|01}})<br />
| {{Mono|true}} → {{Mono|01}}<br />
|-<br />
| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}<br />
| Arrays of values, introduced with the number of entries<br />
| {{Mono|1,2}} → {{Mono|02 01 31 01 32}}<br />
|-<br />
| {{Type|TokenSet}}<br />
| Key array ({{Type|Tokens}}), next/bucket/size arrays (3x {{Type|Nums}})<br />
|<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7003Twitter2012-05-25T09:59:05Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. The examples [[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show that each tweet is streamed as an object containing the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
==Statistics about the data==<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Type<br />
! Description<br />
! Example (native → hex integers)<br />
|-<br />
| {{Type|Num}}<br />
| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/Num.java Num.java]<br />
| {{Mono|15}} → {{Mono|0F}}; {{Mono|511}} → {{Mono|41 FF}}<br/><br />
|-<br />
| {{Type|Token}}<br />
| Length ({{Type|Num}}) and bytes of UTF8 byte representation<br />
| {{Mono|Hello}} → {{Mono|05 48 65 6c 6c 6f}}<br />
|-<br />
| {{Type|Double}}<br />
| Number, stored as token<br />
| {{Mono|123}} → {{Mono|03 31 32 33}}<br />
|-<br />
| {{Type|Boolean}}<br />
| Boolean (1 byte, {{Mono|00}} or {{Mono|01}})<br />
| {{Mono|true}} → {{Mono|01}}<br />
|-<br />
| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}<br />
| Arrays of values, introduced with the number of entries<br />
| {{Mono|1,2}} → {{Mono|02 01 31 01 32}}<br />
|-<br />
| {{Type|TokenSet}}<br />
| Key array ({{Type|Tokens}}), next/bucket/size arrays (3x {{Type|Nums}})<br />
|<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7002Twitter2012-05-25T09:58:15Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. The examples [[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show that<br />
the each tweet is streamed as an object containing the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
==Statistics about the data==<br />
<br />
{| class="wikitable" width="50%"<br />
|-<br />
! Type<br />
! Description<br />
! Example (native → hex integers)<br />
|-<br />
| {{Type|Num}}<br />
| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/Num.java Num.java]<br />
| {{Mono|15}} → {{Mono|0F}}; {{Mono|511}} → {{Mono|41 FF}}<br/><br />
|-<br />
| {{Type|Token}}<br />
| Length ({{Type|Num}}) and bytes of UTF8 byte representation<br />
| {{Mono|Hello}} → {{Mono|05 48 65 6c 6c 6f}}<br />
|-<br />
| {{Type|Double}}<br />
| Number, stored as token<br />
| {{Mono|123}} → {{Mono|03 31 32 33}}<br />
|-<br />
| {{Type|Boolean}}<br />
| Boolean (1 byte, {{Mono|00}} or {{Mono|01}})<br />
| {{Mono|true}} → {{Mono|01}}<br />
|-<br />
| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}<br />
| Arrays of values, introduced with the number of entries<br />
| {{Mono|1,2}} → {{Mono|02 01 31 01 32}}<br />
|-<br />
| {{Type|TokenSet}}<br />
| Key array ({{Type|Tokens}}), next/bucket/size arrays (3x {{Type|Nums}})<br />
|<br />
|}<br />
<br />
==Example Tweet (JSON)==<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
==Example Tweet (XML)==<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7001Twitter2012-05-25T09:50:13Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For storing the tweets we use the standard ''insert'' function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=7000Twitter2012-05-25T09:49:46Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. For the storing of the tweets we use the standard insert function of [[Updates|XQuery Update]].<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=6999Twitter2012-05-25T09:48:34Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be<br />
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used.<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=6998Twitter2012-05-25T09:47:57Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
= BaseX as Twitter Storage=<br />
<br />
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects and has to be<br />
converted by BaseX into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used.<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=6997Twitter2012-05-25T09:34:57Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
<pre><br />
<br />
{<br />
"contributors": null,<br />
"text": "Using BaseX for storing the Twitter Stream",<br />
"geo": null,<br />
"retweeted": false,<br />
"in_reply_to_screen_name": null,<br />
"possibly_sensitive": false,<br />
"truncated": false,<br />
"entities": {<br />
"urls": [<br />
],<br />
"hashtags": [<br />
],<br />
"user_mentions": [<br />
]<br />
},<br />
"in_reply_to_status_id_str": null,<br />
"id": 1984009055807*****,<br />
"in_reply_to_user_id_str": null,<br />
"source": "&lt;a href=\"http:\/\/twitterfeed.com\" rel=\"nofollow\"&gt;twitterfeed&lt;\/a&gt;",<br />
"favorited": false,<br />
"in_reply_to_status_id": null,<br />
"retweet_count": 0,<br />
"created_at": "Fri May 04 13:17:16 +0000 2012",<br />
"in_reply_to_user_id": null,<br />
"possibly_sensitive_editable": true,<br />
"id_str": "1984009055807*****",<br />
"place": null,<br />
"user": {<br />
"location": "",<br />
"default_profile": true,<br />
"statuses_count": 9096,<br />
"profile_background_tile": false,<br />
"lang": "en",<br />
"profile_link_color": "0084B4",<br />
"id": 5024566**,<br />
"following": null,<br />
"protected": false,<br />
"favourites_count": 0,<br />
"profile_text_color": "333333",<br />
"contributors_enabled": false,<br />
"verified": false,<br />
"description": "http:\/\/adf.ly\/5ktAf",<br />
"profile_sidebar_border_color": "C0DEED",<br />
"name": "BaseX",<br />
"profile_background_color": "C0DEED",<br />
"created_at": "Sat Feb 25 04:05:30 +0000 2012",<br />
"default_profile_image": true,<br />
"followers_count": 860,<br />
"geo_enabled": false,<br />
"profile_image_url_https": "https:\/\/si0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"profile_background_image_url": "http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"profile_background_image_url_https": "https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png",<br />
"follow_request_sent": null,<br />
"url": "http:\/\/adf.ly\/5ktAf",<br />
"utc_offset": null,<br />
"time_zone": null,<br />
"notifications": null,<br />
"friends_count": 2004,<br />
"profile_use_background_image": true,<br />
"profile_sidebar_fill_color": "DDEEF6",<br />
"screen_name": "BaseX",<br />
"id_str": "5024566**",<br />
"show_all_inline_media": false,<br />
"profile_image_url": "http:\/\/a0.twimg.com\/sticky\/default_profile_images\/default_profile_0_normal.png",<br />
"is_translator": false,<br />
"listed_count": 0<br />
},<br />
"coordinates": null<br />
}<br />
</pre><br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=6996Twitter2012-05-25T09:30:35Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions"<br />
objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Using BaseX for storing the Twitter Stream&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls/&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;1984009055807*****&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;1984009055807*****&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;5024566**&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;BaseX&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;BaseX&lt;/screen__name&gt;<br />
&lt;id__str&gt;5024566**&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=6995Twitter2012-05-25T09:23:59Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited <br />
possibly__sensitive__editable default__profile profile__background__tile protected &#10;<br />
contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str <br />
in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions" objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Person Of Interest S01E21 480p HDTV x264-SM mkv: http://t.co/8y4sZGXn&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls&gt;<br />
&lt;value type="object"&gt;<br />
&lt;expanded__url&gt;http://adf.ly/88khx&lt;/expanded__url&gt;<br />
&lt;indices&gt;<br />
&lt;value type="number"&gt;50&lt;/value&gt;<br />
&lt;value type="number"&gt;70&lt;/value&gt;<br />
&lt;/indices&gt;<br />
&lt;display__url&gt;adf.ly/88khx&lt;/display__url&gt;<br />
&lt;url&gt;http://t.co/8y4sZGXn&lt;/url&gt;<br />
&lt;/value&gt;<br />
&lt;/urls&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;198400905580781568&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;198400905580781568&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;502456605&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;sweetys music&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;sweetysmusic&lt;/screen__name&gt;<br />
&lt;id__str&gt;502456605&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AWhttps://docs.basex.org/index.php?title=Twitter&diff=6994Twitter2012-05-25T09:22:43Z<p>AW: </p>
<hr />
<div>As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for <br />
all kind of analytics. Twitter provides the developer community with a set of [https://dev.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.<br />
<br />
This article is about the use of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.<br />
<br />
=Twitters' Streaming Data=<br />
<br />
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour<br />
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.<br />
<br />
[[File:Tweets.png]]<br />
<br />
Statistics about the data:<br />
<br />
Example Tweet (JSON):<br />
<br />
Example Tweet (XML):<br />
<br />
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited <br />
possibly__sensitive__editable default__profile profile__background__tile protected<br />
contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator" <br />
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"<br />
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str <br />
in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates" <br />
arrays="urls indices hashtags user__mentions" objects="json entities user"&gt;<br />
&lt;contributors/&gt;<br />
&lt;text&gt;Person Of Interest S01E21 480p HDTV x264-SM mkv: http://t.co/8y4sZGXn&lt;/text&gt;<br />
&lt;geo/&gt;<br />
&lt;retweeted&gt;false&lt;/retweeted&gt;<br />
&lt;in__reply__to__screen__name/&gt;<br />
&lt;possibly__sensitive&gt;false&lt;/possibly__sensitive&gt;<br />
&lt;truncated&gt;false&lt;/truncated&gt;<br />
&lt;entities&gt;<br />
&lt;urls&gt;<br />
&lt;value type="object"&gt;<br />
&lt;expanded__url&gt;http://adf.ly/88khx&lt;/expanded__url&gt;<br />
&lt;indices&gt;<br />
&lt;value type="number"&gt;50&lt;/value&gt;<br />
&lt;value type="number"&gt;70&lt;/value&gt;<br />
&lt;/indices&gt;<br />
&lt;display__url&gt;adf.ly/88khx&lt;/display__url&gt;<br />
&lt;url&gt;http://t.co/8y4sZGXn&lt;/url&gt;<br />
&lt;/value&gt;<br />
&lt;/urls&gt;<br />
&lt;hashtags/&gt;<br />
&lt;user__mentions/&gt;<br />
&lt;/entities&gt;<br />
&lt;in__reply__to__status__id__str/&gt;<br />
&lt;id&gt;198400905580781568&lt;/id&gt;<br />
&lt;in__reply__to__user__id__str/&gt;<br />
&lt;source&gt;&lt;a href="http://twitterfeed.com" rel="nofollow"&gt;twitterfeed&lt;/a&gt;&lt;/source&gt;<br />
&lt;favorited&gt;false&lt;/favorited&gt;<br />
&lt;in__reply__to__status__id/&gt;<br />
&lt;retweet__count&gt;0&lt;/retweet__count&gt;<br />
&lt;created__at&gt;Fri May 04 13:17:16 +0000 2012&lt;/created__at&gt;<br />
&lt;in__reply__to__user__id/&gt;<br />
&lt;possibly__sensitive__editable&gt;true&lt;/possibly__sensitive__editable&gt;<br />
&lt;id__str&gt;198400905580781568&lt;/id__str&gt;<br />
&lt;place/&gt;<br />
&lt;user&gt;<br />
&lt;location/&gt;<br />
&lt;default__profile&gt;true&lt;/default__profile&gt;<br />
&lt;statuses__count&gt;9096&lt;/statuses__count&gt;<br />
&lt;profile__background__tile&gt;false&lt;/profile__background__tile&gt;<br />
&lt;lang&gt;en&lt;/lang&gt;<br />
&lt;profile__link__color&gt;0084B4&lt;/profile__link__color&gt;<br />
&lt;id&gt;502456605&lt;/id&gt;<br />
&lt;following/&gt;<br />
&lt;protected&gt;false&lt;/protected&gt;<br />
&lt;favourites__count&gt;0&lt;/favourites__count&gt;<br />
&lt;profile__text__color&gt;333333&lt;/profile__text__color&gt;<br />
&lt;contributors__enabled&gt;false&lt;/contributors__enabled&gt;<br />
&lt;verified&gt;false&lt;/verified&gt;<br />
&lt;description&gt;http://adf.ly/5ktAf&lt;/description&gt;<br />
&lt;profile__sidebar__border__color&gt;C0DEED&lt;/profile__sidebar__border__color&gt;<br />
&lt;name&gt;sweetys music&lt;/name&gt;<br />
&lt;profile__background__color&gt;C0DEED&lt;/profile__background__color&gt;<br />
&lt;created__at&gt;Sat Feb 25 04:05:30 +0000 2012&lt;/created__at&gt;<br />
&lt;default__profile__image&gt;true&lt;/default__profile__image&gt;<br />
&lt;followers__count&gt;860&lt;/followers__count&gt;<br />
&lt;geo__enabled&gt;false&lt;/geo__enabled&gt;<br />
&lt;profile__image__url__https&gt;https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url__https&gt;<br />
&lt;profile__background__image__url&gt;http://a0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url&gt;<br />
&lt;profile__background__image__url__https&gt;https://si0.twimg.com/images/themes/theme1/bg.png&lt;/profile__background__image__url__https&gt;<br />
&lt;follow__request__sent/&gt;<br />
&lt;url&gt;http://adf.ly/5ktAf&lt;/url&gt;<br />
&lt;utc__offset/&gt;<br />
&lt;time__zone/&gt;<br />
&lt;notifications/&gt;<br />
&lt;friends__count&gt;2004&lt;/friends__count&gt;<br />
&lt;profile__use__background__image&gt;true&lt;/profile__use__background__image&gt;<br />
&lt;profile__sidebar__fill__color&gt;DDEEF6&lt;/profile__sidebar__fill__color&gt;<br />
&lt;screen__name&gt;sweetysmusic&lt;/screen__name&gt;<br />
&lt;id__str&gt;502456605&lt;/id__str&gt;<br />
&lt;show__all__inline__media&gt;false&lt;/show__all__inline__media&gt;<br />
&lt;profile__image__url&gt;http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;/profile__image__url&gt;<br />
&lt;is__translator&gt;false&lt;/is__translator&gt;<br />
&lt;listed__count&gt;0&lt;/listed__count&gt;<br />
&lt;/user&gt;<br />
&lt;coordinates/&gt;<br />
&lt;/json&gt;</pre><br />
<br />
=BaseX Performance=</div>AW