Databases

From BaseX Documentation

(Difference between revisions)
Jump to: navigation, search
(Update Resources)
 
(26 intermediate revisions not shown)
Line 1: Line 1:
This page is part of the [[Getting Started]] Section.
This page is part of the [[Getting Started]] Section.
-
In BaseX, a ''database'' is a pretty light-weight concept and can be compared
+
In BaseX, a ''database'' is a pretty light-weight concept. It may contain one or more '''resources''', which are addressed by a unique database path. There is no explicit layer for collections: Instead, collections are implicitly created and deleted, and collections result from the existence of documents in specific paths. Resources can either be '''XML documents''' or '''raw files''' (binaries). Some information on [[Binary Data|binary data]] can be found on an extra page.
-
to a ''collection''. It contains an arbitrary number of '''resources''',
+
 
-
addressed by their unique database path. Resources can either be
+
Multiple databases can be addressed (queries, updated) with a single XQuery expression. As a single database is restricted to 2 billion nodes (see [[Statistics]]), resources can be distributed across multiple database instances.
-
'''XML documents''' or '''raw files''' (binaries).
+
-
Some information on [[Binary Data|binary data]] can be found on an extra page.
+
=Create Databases=
=Create Databases=
-
New databases can be created via commands, in the GUI, or with any of our
+
Databases can be created via commands, via XQuery, in the GUI, or with any of our [[Developing|APIs]]. If an initial input is specified with create, some time can be saved, as the specified resources will be added to the database in a bulk operation:
-
[[Developing|APIs]]. If some input is specified along with the create operation, it will be added to the database in a bulk operation:
+
* [[Startup#BaseX Standalone|Console]]: <code>CREATE DB db /path/to/resources</code> will add initial documents to a database
* [[Startup#BaseX Standalone|Console]]: <code>CREATE DB db /path/to/resources</code> will add initial documents to a database
* [[Startup#BaseX GUI|GUI]]: Go to ''Database'' → ''New'', press ''Browse'' to choose an initial file or directory, and press ''OK''
* [[Startup#BaseX GUI|GUI]]: Go to ''Database'' → ''New'', press ''Browse'' to choose an initial file or directory, and press ''OK''
-
Database must follow the [[Valid Names|valid names constraints]].
+
The name of a database is restricted to a restricted set of characters (see [[Valid Names]]). Various [[parsers]] can be chosen to control the import process, or to convert different formats to XML.
-
Various [[parsers]] can be chosen to influence the database creation, or to convert different formats to XML.
+
 
 +
'''Note:''' A main-memory database will be created if the {{Option|MAINMEM}} option is enabled ([[Databases#In Memory Database|see below]] for more).
=Access Resources=
=Access Resources=
Line 32: Line 30:
!Description
!Description
|-
|-
-
|<code>[[Database Module#db:open|db:open()]]</code>
+
|[[Database Module#db:open|db:open]]
|{{Code|db:open("db", "path/to/docs")}}
|{{Code|db:open("db", "path/to/docs")}}
|Returns all documents that are found in the database {{Code|db}} at the (optional) path {{Code|path/to/docs}}.
|Returns all documents that are found in the database {{Code|db}} at the (optional) path {{Code|path/to/docs}}.
|-
|-
-
|<code>[http://www.xqueryfunctions.com/xq/fn_collection.html fn:collection()]</code>
+
|[http://www.xqueryfunctions.com/xq/fn_collection.html fn:collection]
|{{Code|collection("db/path/to/docs")}}
|{{Code|collection("db/path/to/docs")}}
-
|Returns all documents at the location {{Code|path/to/docs}} in the database {{Code|db}}.<br/>If no path is specified after the database, all documents in the database will be returned.<br/>If no argument is specified, all documents of the currently opened database will be returned.
+
|Returns all documents at the location {{Code|path/to/docs}} in the database {{Code|db}}.<br/>If no path is specified after the database, all documents in the database will be returned.<br/>If no argument is specified, all documents of the database will be returned that has been opened in the global context.
|-
|-
-
|<code>[http://www.xqueryfunctions.com/xq/fn_doc.html fn:doc()]</code>
+
|[http://www.xqueryfunctions.com/xq/fn_doc.html fn:doc]
|{{Code|doc("db/path/to/doc.xml")}}
|{{Code|doc("db/path/to/doc.xml")}}
-
|Returns the document at the location {{Code|path/to/docs}} in the database {{Code|db}}.<br/>An error is raised if the specified addresses does not address exactly one document.
+
|Returns the document at the location {{Code|path/to/docs}} in the database {{Code|db}}.<br/>An error is raised if the specified yields zero or more than one document.
|}
|}
-
Two more functions are available to extract database information of a node:
+
You can access multiple databases in a single query:
 +
 
 +
<pre class="brush:xquery">
 +
for $i in 1 to 100
 +
return db:open('books' || $i)//book/title
 +
</pre>
 +
 
 +
If the {{Option|DEFAULTDB}} option is turned on, the path argument of the {{Code|fn:doc}} or {{Code|fn:collection}} function will first be resolved against the globally opened database.
 +
 
 +
Two more functions are available for retrieving information on database nodes:
{| class="wikitable"
{| class="wikitable"
Line 53: Line 60:
!Description
!Description
|-
|-
-
|<code>[[Database Module#db:name|db:name()]]</code>
+
|[[Database Module#db:name|db:name]]
|{{Code|db:name($node)}}
|{{Code|db:name($node)}}
|Returns the name of the database in which the specified {{Code|$node}} is stored.
|Returns the name of the database in which the specified {{Code|$node}} is stored.
|-
|-
-
|<code>[[Database Module#db:path|db:path()]]</code>
+
|[[Database Module#db:path|db:path]]
|{{Code|db:path($node)}}
|{{Code|db:path($node)}}
|Returns the path of the database document in which the specified {{Code|$node}} is stored.
|Returns the path of the database document in which the specified {{Code|$node}} is stored.
|}
|}
-
The {{Code|fn:document-uri()}} and {{Code|fn:base-uri()}} functions return URIs that can also be reused as arguments for the {{Code|fn:doc()}} and {{Code|fn:collection()}} functions. As a result, the following example query always returns {{Code|true}}:
+
The {{Code|fn:document-uri}} and {{Code|fn:base-uri}} functions return URIs that can also be reused as arguments for the {{Code|fn:doc}} and {{Code|fn:collection}} functions. As a result, the following example query always returns {{Code|true}}:
<pre class="brush:xquery">
<pre class="brush:xquery">
Line 69: Line 76:
</pre>
</pre>
-
If the argument of {{Code|fn:doc()}} or {{Code|fn:collection()}} does not start with a valid database name, or if the addressed database does not exist, the string is interpreted as URI reference, and the documents found at this location will be returned. Examples:
+
If the argument of {{Code|fn:doc}} or {{Code|fn:collection}} does not start with a valid database name, or if the addressed database does not exist, the string is interpreted as URI reference, and the documents found at this location will be returned. Examples:
* {{Code|doc("http://web.de")}}: retrieves the addressed URI and returns it as a main-memory document node.
* {{Code|doc("http://web.de")}}: retrieves the addressed URI and returns it as a main-memory document node.
 +
* {{Code|doc("myfile.xml")}}: retrieves the given file from the file system and returns it as a main-memory document node. Note that updates to main-memory nodes are not automatically written back to disk unless the {{Option|WRITEBACK}} option is set.
* {{Code|collection("/path/to/docs")}}: returns a main-memory collection with all XML documents found at the addressed file path.
* {{Code|collection("/path/to/docs")}}: returns a main-memory collection with all XML documents found at the addressed file path.
==Raw Files==
==Raw Files==
-
* XQuery: <code>db:retrieve("dbname", "path/to/docs")</code> returns raw files in their Base64 representation. By choosing <code>"method=raw"</code> as [[Serialization|Serialization Option]], the data is returned in its original byte representation:
+
The <code>[[Commands#RETRIEVE|RETRIEVE]]</code> command and the <code>[[Database Module#db:retrieve|db:retrieve]]</code> function can be used to return files in their native byte representation.
 +
 
 +
If the API you use does not support binary output (this is e.g. the case for various [[Clients|Client]] language bindings), you need to convert your binary data to its string representation before returning it to the client:
<pre class="brush:xquery">
<pre class="brush:xquery">
-
declare option output:method "raw";
+
string(db:retrieve('multimedia', 'sample.avi'))
-
db:retrieve('multimedia', 'sample.avi')
+
</pre>
</pre>
-
 
-
* Commands: <code>[[Commands#RETRIEVE|RETRIEVE]]</code> returns raw files without modifications.
 
==HTTP Services==
==HTTP Services==
Line 95: Line 102:
* XML documents can be added with the <code>[[Commands#ADD|ADD]]</code> command.
* XML documents can be added with the <code>[[Commands#ADD|ADD]]</code> command.
* Raw files are added with <code>[[Commands#STORE|STORE]]</code>.
* Raw files are added with <code>[[Commands#STORE|STORE]]</code>.
-
* Resource can be replaced with other ones with the <code>[[Commands#REPLACE|REPLACE]]</code> command.
+
* Existing resources can be replaced with the <code>[[Commands#REPLACE|REPLACE]]</code> command.
* Resources can be deleted via <code>[[Commands#DELETE|DELETE]]</code>.
* Resources can be deleted via <code>[[Commands#DELETE|DELETE]]</code>.
-
The [[Options#AUTOFLUSH|AUTOFLUSH]] option can be turned off before ''bulk operations'' (i.e. before a large number of new resources is added to the database).
+
The {{Option|AUTOFLUSH}} option can be turned off before ''bulk operations'' (i.e. before a large number of new resources is added to the database).
-
The [[Options#ADDCACHE|ADDCACHE]] option will first cache the input before adding it to the database. This is helpful when the input documents to be added are expected to eat up too much main memory.
+
If {{Option|ADDCACHE}} is enabled, the input will be cached before it is added to the database. This is helpful when the input documents to be added are expected to consume too much main memory.
The following commands create an empty database, add two resources, explicitly flush data structures to disk, and finally delete all inserted data:
The following commands create an empty database, add two resources, explicitly flush data structures to disk, and finally delete all inserted data:
Line 115: Line 122:
</pre>
</pre>
-
You may as well use the BaseX-specific [[Database Module|XQuery Database Functions]] to create, add, replace, and delete XML documents:
+
You may also use the BaseX-specific [[Database Module|XQuery Database Functions]] to create, add, replace, and delete XML documents:
<pre class="brush:xquery">
<pre class="brush:xquery">
Line 132: Line 139:
* GUI: Go to ''Database'' → ''Export'', choose the target directory and press ''OK''
* GUI: Go to ''Database'' → ''Export'', choose the target directory and press ''OK''
* WebDAV: Locate the database directory (or a sub-directory of it) and copy all contents to another location
* WebDAV: Locate the database directory (or a sub-directory of it) and copy all contents to another location
 +
 +
=Main-Memory Database Instances=
 +
 +
* In the standalone context, a main-memory database can be created (using <code>CREATE DB</code>), which can then be accessed by subsequent commands.
 +
* If a BaseX server instance is started, and if a database is created in its context (using <code>CREATE DB</code>), other BaseX client instances can access (and update) this database (using OPEN, db:open, etc.) as long as no other database is opened/created by the server.
 +
* You can force an ordinary database to being copied to memory by using <code>db:open('some-db') update {}</code>
 +
 +
'''Note:''' If you address a URI with <code>fn:doc</code> or <code>fn:collection</code> for which no database exists, the resulting internal representation is identical to those of main-memory database instances (no matter which value is set for {{Option|MAINMEM}}).
=Changelog=
=Changelog=
-
;Version 7.2.1
+
;Version 8.4
-
* Updated: {{Code|fn:document-uri()}} and  {{Code|fn:base-uri()}} now return strings that can be reused with {{Code|fn:doc()}} or {{Code|fn:collection()}} to reopen the original document.
+
* Updated: [[#Raw Files|Raw Files]]: Items of binary type can be output without specifying the obsolete <code>raw</code> serialization method.
 +
 
 +
;Version 7.2.1
-
[[Category:Beginner]]
+
* Updated: {{Code|fn:document-uri}} and  {{Code|fn:base-uri}} now return strings that can be reused with {{Code|fn:doc}} or {{Code|fn:collection}} to reopen the original document.

Latest revision as of 13:32, 10 April 2019

This page is part of the Getting Started Section.

In BaseX, a database is a pretty light-weight concept. It may contain one or more resources, which are addressed by a unique database path. There is no explicit layer for collections: Instead, collections are implicitly created and deleted, and collections result from the existence of documents in specific paths. Resources can either be XML documents or raw files (binaries). Some information on binary data can be found on an extra page.

Multiple databases can be addressed (queries, updated) with a single XQuery expression. As a single database is restricted to 2 billion nodes (see Statistics), resources can be distributed across multiple database instances.

Contents

[edit] Create Databases

Databases can be created via commands, via XQuery, in the GUI, or with any of our APIs. If an initial input is specified with create, some time can be saved, as the specified resources will be added to the database in a bulk operation:

The name of a database is restricted to a restricted set of characters (see Valid Names). Various parsers can be chosen to control the import process, or to convert different formats to XML.

Note: A main-memory database will be created if the MAINMEM option is enabled (see below for more).

[edit] Access Resources

Stored resources and external documents can be accessed in different ways:

[edit] XML Documents

Various XQuery functions exist to access XML documents in databases:

Function Example Description
db:open db:open("db", "path/to/docs") Returns all documents that are found in the database db at the (optional) path path/to/docs.
fn:collection collection("db/path/to/docs") Returns all documents at the location path/to/docs in the database db.
If no path is specified after the database, all documents in the database will be returned.
If no argument is specified, all documents of the database will be returned that has been opened in the global context.
fn:doc doc("db/path/to/doc.xml") Returns the document at the location path/to/docs in the database db.
An error is raised if the specified yields zero or more than one document.

You can access multiple databases in a single query:

for $i in 1 to 100
return db:open('books' || $i)//book/title

If the DEFAULTDB option is turned on, the path argument of the fn:doc or fn:collection function will first be resolved against the globally opened database.

Two more functions are available for retrieving information on database nodes:

Function Example Description
db:name db:name($node) Returns the name of the database in which the specified $node is stored.
db:path db:path($node) Returns the path of the database document in which the specified $node is stored.

The fn:document-uri and fn:base-uri functions return URIs that can also be reused as arguments for the fn:doc and fn:collection functions. As a result, the following example query always returns true:

every $c in collection('anyDB')
satisfies doc-available(document-uri($c))

If the argument of fn:doc or fn:collection does not start with a valid database name, or if the addressed database does not exist, the string is interpreted as URI reference, and the documents found at this location will be returned. Examples:

[edit] Raw Files

The RETRIEVE command and the db:retrieve function can be used to return files in their native byte representation.

If the API you use does not support binary output (this is e.g. the case for various Client language bindings), you need to convert your binary data to its string representation before returning it to the client:

string(db:retrieve('multimedia', 'sample.avi'))

[edit] HTTP Services

[edit] Update Resources

Once you have created a database, additional commands exist to modify its contents:

The AUTOFLUSH option can be turned off before bulk operations (i.e. before a large number of new resources is added to the database).

If ADDCACHE is enabled, the input will be cached before it is added to the database. This is helpful when the input documents to be added are expected to consume too much main memory.

The following commands create an empty database, add two resources, explicitly flush data structures to disk, and finally delete all inserted data:

CREATE DB example
SET AUTOFLUSH false
ADD example.xml
SET ADDCACHE true
ADD /path/to/xml/documents
STORE TO images/ 123.jpg
FLUSH
DELETE /

You may also use the BaseX-specific XQuery Database Functions to create, add, replace, and delete XML documents:

let $root := "/path/to/xml/documents/"
for $file in file:list($root)
return db:add("database", $root || $file)

Last but not least, XML documents can also be added via the GUI and the Database menu.

[edit] Export Data

All resources stored in a database can be exported, i.e., written back to disk. This can be done in several ways:

[edit] Main-Memory Database Instances

Note: If you address a URI with fn:doc or fn:collection for which no database exists, the resulting internal representation is identical to those of main-memory database instances (no matter which value is set for MAINMEM).

[edit] Changelog

Version 8.4
Version 7.2.1
Personal tools
Namespaces
Variants
Actions
Navigation
Print/export