Changes

Jump to navigation Jump to search
1,303 bytes added ,  16:59, 10 April 2019
no edit summary
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never ending tweet stream. As Twitter delivers the tweets as [http://www.json.org/ JSON] objects the objects has to be
converted into XML fragments. For this purpose the parse function of the [[JSON Module|XQuery JSON Module]] is used. In the examples section both versions are shown ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].
=Twitters' Twitter’s Streaming Data=
Each tweet object in the data stream contains the tweet message itself and over 60 data fields (for further information see the [https://dev.twitter.com/docs/platform-objects fields description]). In the examples section ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]] show .
The following section shows the amount of data, that is delivered by the Twitter Streaming API to the connected endpoints with the 10% gardenhose access per hour
on the 6th of the months February, March, April and May. It is the pure public live stream without any filtering applied.
<coordinates/>
&lt;/json&gt;</pre>
 
= BaseX Performance =
 
The test show the time BaseX needs to insert large amounts of real tweets into a database. We can derive that BaseX scales very well and can keep up
with the incoming amount of tweets in the stream. Some lower values can occur, cause the size of the tweets differ according to the meta-data contained in the tweet object.<br />
Note: The {{Option|AUTOFLUSH}} option is set to <code>FALSE</code>.
 
System Setup: Mac OS X 10.6.8, 3.2 GHz Intel Core i3, 8 GB 1333 MHz DDR3 RAM <br/>
BaseX Version: BaseX 7.3 beta
 
== Insert with XQuery Update ==
 
These tests show the performance of BaseX performing inserts with XQuery Update as single updates per tweet or bulk updates with different amount of tweets.
The initial database just contained a root node <code><tweets/></code> and all incoming tweets are inserted after converting from JSON to XML into the root node.
The time needed for the inserts includes the conversion time.
 
=== Single Updates ===
 
{| class="wikitable" width="50%"
|-
! Amount of tweets
! Time in seconds
! Time in minutes
! Database Size (without indexes)
|-
| 1.000.000
| 492.26346
| 8.2
| 3396 MB<br/>
|-
|-
| 2.000.000
| 461.87326
| 7.6
| 6997 MB<br/>
|-
|-
| 3.000.000
| 470.7054
| 7.8
| 10452 MB<br/>
|-
|}
 
[[File:insertTweets.png]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu