Difference between revisions of "Databases"

From BaseX Documentation
Jump to navigation Jump to search
m (Text replacement - "syntaxhighlight" to "pre")
 
(44 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
This page is part of the [[Getting Started]] Section.
 
This page is part of the [[Getting Started]] Section.
  
In BaseX, a ''database'' is a pretty light-weight concept and can be compared
+
In BaseX, a ''database'' is a pretty light-weight concept. It may contain one or more '''resources''', which are addressed by a unique database path. There is no explicit layer for collections: Instead, collections are implicitly created and deleted, and collections result from the existence of documents in specific paths.
to a ''collection''. It contains an arbitrary number of '''resources''',
+
 
addressed by their unique database path. Resources can either be
+
As a single database is restricted to 2 billion XML nodes (see [[Statistics]]), but resources can easily be distributed across multiple database instances. Multiple databases can be addressed (queried, updated) by a single XQuery expression.
'''XML documents''' or '''raw files''' (binaries).
+
 
Some information on [[Binary Data|binary data]] can be found on an extra page.
+
Three different resource types exist:
 +
 
 +
{|class="wikitable"
 +
|- valign="top"
 +
| '''Resource Type'''
 +
| '''Description'''
 +
|- valign="top"
 +
| XML Documents
 +
| The default resource type. The storage and index features are optimized for XML contents, or any other contents stored in an XML representation.
 +
|- valign="top"
 +
| Binary Data
 +
| Binary data: Raw data of any type, stored in its binary representation. See [[Binary Data]] for more information.
 +
|- valign="top"
 +
| XQuery Values
 +
| Results of XQuery expressions, stored in a binary representation for fast retrieval. All value types are supported, including maps and arrays, but excluding any other [[Higher-Order Functions#Function Items|function items]].
 +
|}
  
 
=Create Databases=
 
=Create Databases=
  
New databases can be created via commands, in the GUI, or with any of our
+
Databases can be created via [[Commands]], via [[XQuery]], in the [[GUI]], and with various [[Developing|APIs]]. If an initial input is specified with a create operation, some time can be saved, as the specified resources will be added to the database in a bulk operation:
[[Developing|APIs]]. If some input is specified along with the create operation, it will be added to the database in a bulk operation:
 
 
 
* [[Startup#BaseX Standalone|Console]]: <code>CREATE DB db /path/to/resources</code> will add initial documents to a database
 
* [[Startup#BaseX GUI|GUI]]: Go to ''Database'' → ''New'', press ''Browse'' to choose an initial file or directory, and press ''OK''
 
  
Database must follow the [[Valid Names|valid names constraints]].
+
* [[Startup#BaseX Standalone|Command-Line]]: <code>CREATE DB documents /path/to/resources</code>: Add resources in the specified path to a database named {{Code|documents}}.
Various [[parsers]] can be chosen to influence the database creation, or to convert different formats to XML.
+
* [[Startup#BaseX GUI|GUI]]: Go to ''Database'' → ''New'', press ''Browse…'' to choose an initial file or directory, and press ''OK''.
  
'''Note:''' A main-memory only database can be created using the the <code>SET MAINMEM true</code> command before calling <code>CREATE DB</code> ([[Databases#In Memory Database|see below]] for more).
+
The database name is composed of a restricted set of characters (see [[Valid Names]]). Various [[Parsers]] can be selected to control the import process, or to convert data of different input type to XML.
  
 
=Access Resources=
 
=Access Resources=
Line 29: Line 40:
  
 
{| class="wikitable"
 
{| class="wikitable"
|-
+
|- valign="top"
 
!Function
 
!Function
 
!Example
 
!Example
 
!Description
 
!Description
|-
+
|- valign="top"
|<code>[[Database Module#db:open|db:open]]</code>
+
|{{Function|Database|db:get}}
|{{Code|db:open("db", "path/to/docs")}}
+
|{{Code|db:get("db", "path/to/docs")}}
 
|Returns all documents that are found in the database {{Code|db}} at the (optional) path {{Code|path/to/docs}}.
 
|Returns all documents that are found in the database {{Code|db}} at the (optional) path {{Code|path/to/docs}}.
|-
+
|- valign="top"
 
|<code>[http://www.xqueryfunctions.com/xq/fn_collection.html fn:collection]</code>
 
|<code>[http://www.xqueryfunctions.com/xq/fn_collection.html fn:collection]</code>
 
|{{Code|collection("db/path/to/docs")}}
 
|{{Code|collection("db/path/to/docs")}}
 
|Returns all documents at the location {{Code|path/to/docs}} in the database {{Code|db}}.<br/>If no path is specified after the database, all documents in the database will be returned.<br/>If no argument is specified, all documents of the database will be returned that has been opened in the global context.
 
|Returns all documents at the location {{Code|path/to/docs}} in the database {{Code|db}}.<br/>If no path is specified after the database, all documents in the database will be returned.<br/>If no argument is specified, all documents of the database will be returned that has been opened in the global context.
|-
+
|- valign="top"
 
|<code>[http://www.xqueryfunctions.com/xq/fn_doc.html fn:doc]</code>
 
|<code>[http://www.xqueryfunctions.com/xq/fn_doc.html fn:doc]</code>
 
|{{Code|doc("db/path/to/doc.xml")}}
 
|{{Code|doc("db/path/to/doc.xml")}}
Line 47: Line 58:
 
|}
 
|}
  
If the [[Options#DEFAULTDB|DEFAULTDB]] option is turned on, the path argument of the {{Code|fn:doc}} or {{Code|fn:collection}} function will first be resolved against the globally opened database.
+
You can access multiple databases in a single query:
 +
 
 +
<pre lang='xquery'>
 +
for $i in 1 to 100
 +
return db:get('books' || $i)//book/title
 +
</pre>
 +
 
 +
If the {{Option|DEFAULTDB}} option is turned on, the path argument of the {{Code|fn:doc}} or {{Code|fn:collection}} functions will first be resolved against the globally opened database.
  
 
Two more functions are available for retrieving information on database nodes:
 
Two more functions are available for retrieving information on database nodes:
Line 56: Line 74:
 
!Example
 
!Example
 
!Description
 
!Description
|-
+
|- valign="top"
|<code>[[Database Module#db:name|db:name]]</code>
+
|{{Function|Database|db:name}}
 
|{{Code|db:name($node)}}
 
|{{Code|db:name($node)}}
 
|Returns the name of the database in which the specified {{Code|$node}} is stored.
 
|Returns the name of the database in which the specified {{Code|$node}} is stored.
|-
+
|- valign="top"
|<code>[[Database Module#db:path|db:path]]</code>
+
|{{Function|Database|db:path}}
 
|{{Code|db:path($node)}}
 
|{{Code|db:path($node)}}
 
|Returns the path of the database document in which the specified {{Code|$node}} is stored.
 
|Returns the path of the database document in which the specified {{Code|$node}} is stored.
Line 68: Line 86:
 
The {{Code|fn:document-uri}} and {{Code|fn:base-uri}} functions return URIs that can also be reused as arguments for the {{Code|fn:doc}} and {{Code|fn:collection}} functions. As a result, the following example query always returns {{Code|true}}:
 
The {{Code|fn:document-uri}} and {{Code|fn:base-uri}} functions return URIs that can also be reused as arguments for the {{Code|fn:doc}} and {{Code|fn:collection}} functions. As a result, the following example query always returns {{Code|true}}:
  
<pre class="brush:xquery">
+
<pre lang='xquery'>
 
every $c in collection('anyDB')
 
every $c in collection('anyDB')
 
satisfies doc-available(document-uri($c))
 
satisfies doc-available(document-uri($c))
Line 76: Line 94:
  
 
* {{Code|doc("http://web.de")}}: retrieves the addressed URI and returns it as a main-memory document node.
 
* {{Code|doc("http://web.de")}}: retrieves the addressed URI and returns it as a main-memory document node.
* {{Code|doc("myfile.xml")}}: retrieves the given file from the file system and returns it as a main-memory document node. Note that updates to main-memory nodes are not automatically written back to disk unless the <code>[[Options#WRITEBACK|WRITEBACK]]</code> option is set.
+
* {{Code|doc("myfile.xml")}}: retrieves the given file from the file system and returns it as a main-memory document node. Note that updates to main-memory nodes are not automatically written back to disk unless the {{Option|WRITEBACK}} option is set.
 
* {{Code|collection("/path/to/docs")}}: returns a main-memory collection with all XML documents found at the addressed file path.
 
* {{Code|collection("/path/to/docs")}}: returns a main-memory collection with all XML documents found at the addressed file path.
  
==Raw Files==
+
==Binary Data==
 +
 
 +
The {{Command|BINARY GET}} command and the {{Function|Database|db:get-binary}} function can be used to return files in their native byte representation.
  
* XQuery: <code>db:retrieve("dbname", "path/to/docs")</code> returns raw files in their Base64 representation. By choosing <code>"method=raw"</code> as [[Serialization|Serialization Option]], the data is returned in its original byte representation:
+
If the API you use does not support binary output (which is e.g. the case for various [[Clients|Client]] language bindings), you can convert your binary data to its string representation before returning it to the client:
  
<pre class="brush:xquery">
+
<pre lang='xquery'>
declare option output:method "raw";
+
string(db:get-binary('multimedia', 'sample.avi'))
db:retrieve('multimedia', 'sample.avi')
 
 
</pre>
 
</pre>
  
* Commands: <code>[[Commands#RETRIEVE|RETRIEVE]]</code> returns raw files without modifications.
+
==XQuery Values==
  
==HTTP Services==
+
With {{Function|Database|db:get-value}}, XQuery values can be retrieved. In the following example, we assume that an XQuery map {{Code|cities}} was stored in an {{Code|indexes}} database:
  
* With [[REST]] and [[WebDAV]], all database resources can be requested in a uniform way, no matter if they are well-formed XML documents or binary files.
+
<pre lang='xquery'>
 +
let $city-map := db:get-value('indexes', 'cities')
 +
return $city-map?Chile
 +
</pre>
  
 
=Update Resources=
 
=Update Resources=
 +
 +
==Commands==
  
 
Once you have created a database, additional commands exist to modify its contents:
 
Once you have created a database, additional commands exist to modify its contents:
  
* XML documents can be added with the <code>[[Commands#ADD|ADD]]</code> command.
+
* XML documents can be added with the {{Command|PUT}} and {{Command|ADD}} commands.
* Raw files are added with <code>[[Commands#STORE|STORE]]</code>.
+
* Binary data is stored with {{Command|BINARY PUT}}.
* Existing resources can be replaced with the <code>[[Commands#REPLACE|REPLACE]]</code> command.
+
* Resources of all types can be deleted via {{Command|DELETE}}.
* Resources can be deleted via <code>[[Commands#DELETE|DELETE]]</code>.
 
  
The [[Options#AUTOFLUSH|AUTOFLUSH]] option can be turned off before ''bulk operations'' (i.e. before a large number of new resources is added to the database).
+
{{Option|AUTOFLUSH}} can be turned off before ''bulk operations'' (i.e., before numerous new resources are added to the database).
  
The [[Options#ADDCACHE|ADDCACHE]] option will first cache the input before adding it to the database. This is helpful when the input documents to be added are expected to eat up too much main memory.
+
If {{Option|ADDCACHE}} is enabled, the input will be cached before it is added to the database. This is helpful when the input documents are expected to consume too much main-memory.
  
The following commands create an empty database, add two resources, explicitly flush data structures to disk, and finally delete all inserted data:
+
With the following [[Commands#Command Scripts|command script]], an empty database is created, two resources are added (one directly, another one cached), and all data is exported to the file system:
  
 
<pre>
 
<pre>
Line 115: Line 138:
 
SET ADDCACHE true
 
SET ADDCACHE true
 
ADD /path/to/xml/documents
 
ADD /path/to/xml/documents
STORE TO images/ 123.jpg
+
BINARY PUT TO images/ 123.jpg
FLUSH
+
EXPORT /path/to/file-system/
DELETE /
+
</pre>
 +
 
 +
==XQuery==
 +
 
 +
You can also use functions from the [[Database Module]] to add, replace, or delete XML documents:
 +
 
 +
<pre lang='xquery'>
 +
db:add('documents', '/path/to/xml/resources/')
 +
</pre>
 +
 
 +
Function from other modules, such as the [[File Module]], can be utilized to filter the input. With the following code, all files that contain numbers in the filename are selected, and stored as XML. If an input file contains no well-formed XML, it is stored as binary resource, and the error message is stored as a string value:
 +
 
 +
<pre lang='xquery'>
 +
let $db := 'documents'
 +
let $root := '/path/to/resources/'
 +
for $path in file:list($root)
 +
where matches($path, '\d+')
 +
return try {
 +
  db:put($db, fetch:doc($root || $path), $path)
 +
} catch * {
 +
  db:put-binary($db, $root || $path, $path),
 +
  db:put-value($db, $err:description, $path || '.error')
 +
}
 
</pre>
 
</pre>
  
You may as well use the BaseX-specific [[Database Module|XQuery Database Functions]] to create, add, replace, and delete XML documents:
+
The error messages can e.g. be analyzed in a second step:
  
<pre class="brush:xquery">
+
<pre lang='xquery'>
let $root := "/path/to/xml/documents/"
+
let $errors := db:get-value('documents')
for $file in file:list($root)
+
for $filename in map:keys($errors)
return db:add("database", $root || $file)
+
where ends-with($filename, '.error')
 +
return $filename || ': ' || $errors?($filename)
 
</pre>
 
</pre>
  
Last but not least, XML documents can also be added via the GUI and the ''Database'' menu.
+
=Export Database=
  
=Export Data=
+
All resources stored in a database can be ''exported'', i.e., written back to disk, e.g., as follows:
  
All resources stored in a database can be ''exported'', i.e., written back to disk. This can be done in several ways:
+
* Commands: {{Command|EXPORT}} writes all resources to the specified target directory.
 +
* GUI: Go to ''Database'' → ''Export'', choose the target directory and press ''OK''.
 +
* XQuery: Use {{Function|Database|db:export}}.
 +
* WebDAV: Locate the database directory (or a subdirectory of it) and copy all contents to another location.
  
* Commands: <code>[[Commands#EXPORT|EXPORT]]</code> writes all resources to the  specified target directory
+
=Main-Memory Databases=
* GUI: Go to ''Database'' → ''Export'', choose the target directory and press ''OK''
 
* WebDAV: Locate the database directory (or a sub-directory of it) and copy all contents to another location
 
  
=In Memory Database=
+
A database can be created in main-memory by enabling the {{Option|MAINMEM}} option. Next, in the standalone context, a main-memory database can be created, which can then be accessed by subsequent commands.
  
* In the standalone context, a main-memory database can be created (using <code>CREATE DB</code>), which can then be accessed by subsequent commands.
+
If a BaseX server is started, and if a database is created in its context at startup time, e.g., with the [[Command-Line Options|command-line option -c]] and a {{Command|CREATE DB}} call, BaseX clients can then access and update this database:
* If a BaseX server instance is started, and if a database is created in its context (using <code>CREATE DB</code>), other BaseX client instances can access (and update) this database (using OPEN, db:open, etc.) as long as no other database is opened/created by the server.
 
  
'''Note:''' main-memory database instances are also created by the invocation of <code>doc(...)</code> or <code>collection(...)</code>, if the argument is not a
+
<pre lang="perl">
database (no matter which value is set for MAINMEM). In other words:
+
# Server
the same internal representation is used for main-memory databases and
+
basexserver -c"SET mainmem on" -c"CREATE DB mainmem document.xml"
documents/collections generated via XQuery.
+
BaseX [Server]
 +
Server was started (port: 1984).
 +
MAINMEM: true
 +
Database 'mainmem' created in 1782.80 ms.
 +
 
 +
# Client
 +
basexclient
 +
Username: ...
 +
Password: ...
 +
BaseX [Client]
 +
Try 'help' to get more information.
 +
> XQUERY count(db:get('mainmem')//*)
 +
1876462
 +
Query executed in 0.97 ms.
 +
</pre>
 +
 
 +
Additional notes:
 +
* You can force an ordinary database, or parts of it, to being temporarily copied to memory by applying an empty [[XQuery_Update#Main-Memory_Updates|main-memory update]] on a database node: <code>db:get('some-db') update { }</code>
 +
* If you open local or remote documents with <code>fn:doc</code> or <code>fn:collection</code>, the resulting internal representation is identical to those of main-memory database instances (regardless of which value is set for {{Option|MAINMEM}}).
  
 
=Changelog=
 
=Changelog=
 +
 +
;Version 10.0
 +
* Added: New resource type for XQuery values.
 +
 +
;Version 8.4
 +
* Updated: Items of binary type can be output without specifying the obsolete <code>raw</code> serialization method.
  
 
;Version 7.2.1
 
;Version 7.2.1
 
 
* Updated: {{Code|fn:document-uri}} and  {{Code|fn:base-uri}} now return strings that can be reused with {{Code|fn:doc}} or {{Code|fn:collection}} to reopen the original document.
 
* Updated: {{Code|fn:document-uri}} and  {{Code|fn:base-uri}} now return strings that can be reused with {{Code|fn:doc}} or {{Code|fn:collection}} to reopen the original document.
 
[[Category:Beginner]]
 

Latest revision as of 17:39, 1 December 2023

This page is part of the Getting Started Section.

In BaseX, a database is a pretty light-weight concept. It may contain one or more resources, which are addressed by a unique database path. There is no explicit layer for collections: Instead, collections are implicitly created and deleted, and collections result from the existence of documents in specific paths.

As a single database is restricted to 2 billion XML nodes (see Statistics), but resources can easily be distributed across multiple database instances. Multiple databases can be addressed (queried, updated) by a single XQuery expression.

Three different resource types exist:

Resource Type Description
XML Documents The default resource type. The storage and index features are optimized for XML contents, or any other contents stored in an XML representation.
Binary Data Binary data: Raw data of any type, stored in its binary representation. See Binary Data for more information.
XQuery Values Results of XQuery expressions, stored in a binary representation for fast retrieval. All value types are supported, including maps and arrays, but excluding any other function items.

Create Databases[edit]

Databases can be created via Commands, via XQuery, in the GUI, and with various APIs. If an initial input is specified with a create operation, some time can be saved, as the specified resources will be added to the database in a bulk operation:

  • Command-Line: CREATE DB documents /path/to/resources: Add resources in the specified path to a database named documents.
  • GUI: Go to DatabaseNew, press Browse… to choose an initial file or directory, and press OK.

The database name is composed of a restricted set of characters (see Valid Names). Various Parsers can be selected to control the import process, or to convert data of different input type to XML.

Access Resources[edit]

Stored resources and external documents can be accessed in different ways:

XML Documents[edit]

Various XQuery functions exist to access XML documents in databases:

Function Example Description
db:get db:get("db", "path/to/docs") Returns all documents that are found in the database db at the (optional) path path/to/docs.
fn:collection collection("db/path/to/docs") Returns all documents at the location path/to/docs in the database db.
If no path is specified after the database, all documents in the database will be returned.
If no argument is specified, all documents of the database will be returned that has been opened in the global context.
fn:doc doc("db/path/to/doc.xml") Returns the document at the location path/to/docs in the database db.
An error is raised if the specified yields zero or more than one document.

You can access multiple databases in a single query:

for $i in 1 to 100
return db:get('books' || $i)//book/title

If the DEFAULTDB option is turned on, the path argument of the fn:doc or fn:collection functions will first be resolved against the globally opened database.

Two more functions are available for retrieving information on database nodes:

Function Example Description
db:name db:name($node) Returns the name of the database in which the specified $node is stored.
db:path db:path($node) Returns the path of the database document in which the specified $node is stored.

The fn:document-uri and fn:base-uri functions return URIs that can also be reused as arguments for the fn:doc and fn:collection functions. As a result, the following example query always returns true:

every $c in collection('anyDB')
satisfies doc-available(document-uri($c))

If the argument of fn:doc or fn:collection does not start with a valid database name, or if the addressed database does not exist, the string is interpreted as URI reference, and the documents found at this location will be returned. Examples:

  • doc("http://web.de"): retrieves the addressed URI and returns it as a main-memory document node.
  • doc("myfile.xml"): retrieves the given file from the file system and returns it as a main-memory document node. Note that updates to main-memory nodes are not automatically written back to disk unless the WRITEBACK option is set.
  • collection("/path/to/docs"): returns a main-memory collection with all XML documents found at the addressed file path.

Binary Data[edit]

The BINARY GET command and the db:get-binary function can be used to return files in their native byte representation.

If the API you use does not support binary output (which is e.g. the case for various Client language bindings), you can convert your binary data to its string representation before returning it to the client:

string(db:get-binary('multimedia', 'sample.avi'))

XQuery Values[edit]

With db:get-value, XQuery values can be retrieved. In the following example, we assume that an XQuery map cities was stored in an indexes database:

let $city-map := db:get-value('indexes', 'cities')
return $city-map?Chile

Update Resources[edit]

Commands[edit]

Once you have created a database, additional commands exist to modify its contents:

  • XML documents can be added with the PUT and ADD commands.
  • Binary data is stored with BINARY PUT.
  • Resources of all types can be deleted via DELETE.

AUTOFLUSH can be turned off before bulk operations (i.e., before numerous new resources are added to the database).

If ADDCACHE is enabled, the input will be cached before it is added to the database. This is helpful when the input documents are expected to consume too much main-memory.

With the following command script, an empty database is created, two resources are added (one directly, another one cached), and all data is exported to the file system:

CREATE DB example
SET AUTOFLUSH false
ADD example.xml
SET ADDCACHE true
ADD /path/to/xml/documents
BINARY PUT TO images/ 123.jpg
EXPORT /path/to/file-system/

XQuery[edit]

You can also use functions from the Database Module to add, replace, or delete XML documents:

db:add('documents', '/path/to/xml/resources/')

Function from other modules, such as the File Module, can be utilized to filter the input. With the following code, all files that contain numbers in the filename are selected, and stored as XML. If an input file contains no well-formed XML, it is stored as binary resource, and the error message is stored as a string value:

let $db := 'documents'
let $root := '/path/to/resources/'
for $path in file:list($root)
where matches($path, '\d+')
return try {
  db:put($db, fetch:doc($root || $path), $path)
} catch * {
  db:put-binary($db, $root || $path, $path),
  db:put-value($db, $err:description, $path || '.error')
}

The error messages can e.g. be analyzed in a second step:

let $errors := db:get-value('documents')
for $filename in map:keys($errors)
where ends-with($filename, '.error')
return $filename || ': ' || $errors?($filename)

Export Database[edit]

All resources stored in a database can be exported, i.e., written back to disk, e.g., as follows:

  • Commands: EXPORT writes all resources to the specified target directory.
  • GUI: Go to DatabaseExport, choose the target directory and press OK.
  • XQuery: Use db:export.
  • WebDAV: Locate the database directory (or a subdirectory of it) and copy all contents to another location.

Main-Memory Databases[edit]

A database can be created in main-memory by enabling the MAINMEM option. Next, in the standalone context, a main-memory database can be created, which can then be accessed by subsequent commands.

If a BaseX server is started, and if a database is created in its context at startup time, e.g., with the command-line option -c and a CREATE DB call, BaseX clients can then access and update this database:

# Server
basexserver -c"SET mainmem on" -c"CREATE DB mainmem document.xml"
BaseX [Server]
Server was started (port: 1984).
MAINMEM: true
Database 'mainmem' created in 1782.80 ms.

# Client
basexclient
Username: ...
Password: ...
BaseX [Client]
Try 'help' to get more information.
> XQUERY count(db:get('mainmem')//*)
1876462
Query executed in 0.97 ms.

Additional notes:

  • You can force an ordinary database, or parts of it, to being temporarily copied to memory by applying an empty main-memory update on a database node: db:get('some-db') update { }
  • If you open local or remote documents with fn:doc or fn:collection, the resulting internal representation is identical to those of main-memory database instances (regardless of which value is set for MAINMEM).

Changelog[edit]

Version 10.0
  • Added: New resource type for XQuery values.
Version 8.4
  • Updated: Items of binary type can be output without specifying the obsolete raw serialization method.
Version 7.2.1
  • Updated: fn:document-uri and fn:base-uri now return strings that can be reused with fn:doc or fn:collection to reopen the original document.