Changes

Jump to navigation Jump to search
1,236 bytes added ,  10:58, 25 May 2012
no edit summary
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. The examples [[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show thatthe each tweet is streamed as an object containing the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].
=Twitters' Streaming Data=
The following figure shows the amount of data, that is delivered by the Twitter Streaming API] to the connected endpoints with the 10% gardenhose access per hour
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.
[[File:Tweets.png]]
==Statistics about the data:==
{| class="wikitable" width="50%"|-! Type! Description! Example (native → hex integers)|-| {{Type|Num}}| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/Num.java Num.java]| {{Mono|15}} → {{Mono|0F}}; {{Mono|511}} → {{Mono|41 FF}}<br/>|-| {{Type|Token}}| Length ({{Type|Num}}) and bytes of UTF8 byte representation| {{Mono|Hello}} → {{Mono|05 48 65 6c 6c 6f}}|-| {{Type|Double}}| Number, stored as token| {{Mono|123}} → {{Mono|03 31 32 33}}|-| {{Type|Boolean}}| Boolean (1 byte, {{Mono|00}} or {{Mono|01}})| {{Mono|true}} → {{Mono|01}}|-| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}| Arrays of values, introduced with the number of entries| {{Mono|1,2}} → {{Mono|02 01 31 01 32}}|-| {{Type|TokenSet}}| Key array ({{Type|Tokens}}), next/bucket/size arrays (3x {{Type|Nums}})||} ==Example Tweet (JSON):==
<pre>
</pre>
==Example Tweet (XML):==
<pre class="brush:xml">&lt;json booleans="retweeted possibly__sensitive truncated favorited possibly__sensitive__editable default__profile profile__background__tile protected contributors__enabled verified default__profile__image geo__enabled profile__use__background__image show__all__inline__media is__translator"
bueraucrat, Bureaucrats, editor, reviewer, Administrators
907

edits

Navigation menu