Changes

Jump to navigation Jump to search
71 bytes removed ,  13:18, 2 July 2020
This article is part of the [[Advanced User's Guide]]. It is about the usage of BaseX for processing and storing the live data stream of Twitter. We illustrate some statistics about the Twitter data and the performance of BaseX.
As [http://twitter.com Twitter] attracts more and more users (over 140 million active users in 2012) and is generating large amounts of data (over 340 millions of short messages ('tweets') daily), it became a really exciting data source for all kind of analytics. Twitter provides the developer community with a set of [https://devdeveloper.twitter.com/start APIs] for retrieving the data about its users and their communication, including the [https://dev.twitter.com/docs/streaming-apis Streaming API] for data-intensive applications, the [https://dev.twitter.com/docs/using-search Search API] for querying and filtering the messaging content, and the [https://dev.twitter.com/docs/api REST API] for accessing the core primitives of the Twitter platform.
= BaseX as Twitter Storage=
For retrieving the Twitter stream we connect with the Streaming API to the endpoint of Twitter and receive a never -ending tweet stream. As Twitter delivers the tweets as [https://www.json.org/ JSON] objects , the objects has to bedata is converted into XML fragments. For this purpose , the parse function of the [[JSON Module|XQuery JSON Module]] is used. In the examples section both versions are shown ([[#Example Tweet (JSON)|tweet as JSON]] and [[#Example Tweet (XML)|tweet as XML]]). For storing the tweets including the meta-data, we use the standard ''insert'' function of [[Updates|XQuery Update]].
=Twitter’s Streaming Data=
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu