Changes

Jump to navigation Jump to search
723 bytes removed ,  13:18, 2 July 2020
This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for all kind of analytics. Twitter provides the developer community with a set of [https://devdeveloper.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.
= BaseX as Twitter Storage=
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never -ending tweet stream. As Twitter delivers the tweets as [httphttps://www.json.org/ JSON] objects , the objects has to bedata is converted into XML fragments. For this purpose , the parse function of the [[JSON Module|XQuery JSON Module]] is used. In the examples section both versions are shown ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].
=Twitter’s Streaming Data=
==Example Tweet (XML)==
<pre classsyntaxhighlight lang="brush:xml">&lt;<json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator"
numbers="id retweet__count statuses__count favourites__count followers__count friends__count listed__count"
nulls="contributors geo in__reply__to__screen__name in__reply__to__status__id__str in__reply__to__user__id__str in__reply__to__status__id in__reply__to__user__id place following follow__request__sent utc__offset time__zone notifications coordinates"
arrays="urls indices hashtags user__mentions"
objects="json entities user"&gt;> &lt;<contributors/&gt;> &lt;<text&gt;>Using BaseX for storing the Twitter Stream&lt;</text&gt;> &lt;<geo/&gt;> &lt;<retweeted&gt;>false&lt;</retweeted&gt;> &lt;<in__reply__to__screen__name/&gt;> &lt;<possibly__sensitive&gt;>false&lt;</possibly__sensitive&gt;> &lt;<truncated&gt;>false&lt;</truncated&gt;> &lt;<entities&gt;> &lt;<urls/&gt;> &lt;<hashtags/&gt;> &lt;<user__mentions/&gt;> &lt;</entities&gt;> &lt;<in__reply__to__status__id__str/&gt;> &lt;<id&gt;>1984009055807*****&lt;</id&gt;> &lt;<in__reply__to__user__id__str/&gt;> &lt;<source&gt;&lt;><a href="http://twitterfeed.com" rel="nofollow"&gt;>twitterfeed&lt;</a&gt;&lt;></source&gt;> &lt;<favorited&gt;>false&lt;</favorited&gt;> &lt;<in__reply__to__status__id/&gt;> &lt;<retweet__count&gt;>0&lt;</retweet__count&gt;> &lt;<created__at&gt;>Fri May 04 13:17:16 +0000 2012&lt;</created__at&gt;> &lt;<in__reply__to__user__id/&gt;> &lt;<possibly__sensitive__editable&gt;>true&lt;</possibly__sensitive__editable&gt;> &lt;<id__str&gt;>1984009055807*****&lt;</id__str&gt;> &lt;<place/&gt;> &lt;<user&gt;> &lt;<location/&gt;> &lt;<default__profile&gt;>true&lt;</default__profile&gt;> &lt;<statuses__count&gt;>9096&lt;</statuses__count&gt;> &lt;<profile__background__tile&gt;>false&lt;</profile__background__tile&gt;> &lt;<lang&gt;>en&lt;</lang&gt;> &lt;<profile__link__color&gt;>0084B4&lt;</profile__link__color&gt;> &lt;<id&gt;>5024566**&lt;</id&gt;> &lt;<following/&gt;> &lt;<protected&gt;>false&lt;</protected&gt;> &lt;<favourites__count&gt;>0&lt;</favourites__count&gt;> &lt;<profile__text__color&gt;>333333&lt;</profile__text__color&gt;> &lt;<contributors__enabled&gt;>false&lt;</contributors__enabled&gt;> &lt;<verified&gt;>false&lt;</verified&gt;> &lt;<description&gt;>http://basex.org&lt;</description&gt;> &lt;<profile__sidebar__border__color&gt;>C0DEED&lt;</profile__sidebar__border__color&gt;> &lt;<name&gt;>BaseX&lt;</name&gt;> &lt;<profile__background__color&gt;>C0DEED&lt;</profile__background__color&gt;> &lt;<created__at&gt;>Sat Feb 25 04:05:30 +0000 2012&lt;</created__at&gt;> &lt;<default__profile__image&gt;>true&lt;</default__profile__image&gt;> &lt;<followers__count&gt;>860&lt;</followers__count&gt;> &lt;<geo__enabled&gt;>false&lt;</geo__enabled&gt;> &lt;<profile__image__url__https&gt;>https://si0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;</profile__image__url__https&gt;> &lt;<profile__background__image__url&gt;>http://a0.twimg.com/images/themes/theme1/bg.png&lt;</profile__background__image__url&gt;> &lt;<profile__background__image__url__https&gt;>https://si0.twimg.com/images/themes/theme1/bg.png&lt;</profile__background__image__url__https&gt;> &lt;<follow__request__sent/&gt;> &lt;<url&gt;>http://adf.ly/5ktAf&lt;</url&gt;> &lt;<utc__offset/&gt;> &lt;<time__zone/&gt;> &lt;<notifications/&gt;> &lt;<friends__count&gt;>2004&lt;</friends__count&gt;> &lt;<profile__use__background__image&gt;>true&lt;</profile__use__background__image&gt;> &lt;<profile__sidebar__fill__color&gt;>DDEEF6&lt;</profile__sidebar__fill__color&gt;> &lt;<screen__name&gt;>BaseX&lt;</screen__name&gt;> &lt;<id__str&gt;>5024566**&lt;</id__str&gt;> &lt;<show__all__inline__media&gt;>false&lt;</show__all__inline__media&gt;> &lt;<profile__image__url&gt;>http://a0.twimg.com/sticky/default_profile_images/default_profile_0_normal.png&lt;</profile__image__url&gt;> &lt;<is__translator&gt;>false&lt;</is__translator&gt;> &lt;<listed__count&gt;>0&lt;</listed__count&gt;> &lt;</user&gt;> &lt;<coordinates/&gt;>&lt;</json&gt;></presyntaxhighlight>
= BaseX Performance =
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu