Databases
In BaseX, a database is a fairly lightweight construct. It may contain one or more resources, which are addressed by a unique database path. There is no explicit layer for collections; instead, collections are created and deleted implicitly, reflecting the documents that exist at a given path.
A single database is restricted to 2 billion XML nodes (see Statistics), but resources can easily be distributed across multiple database instances. Multiple databases can be addressed (queried, updated) by a single XQuery expression.
Three different resource types exist:
| Resource Type | Description |
|---|---|
| XML Documents | The default resource type. The storage and index features are optimized for XML content, or for any other data stored in an XML representation. |
| Binary Data | Binary data: Raw data of any type, stored in its binary representation. See Binary Data for more information. |
| XQuery Values | Results of XQuery expressions, stored in a binary representation for fast retrieval. All value types are supported, including maps and arrays, but excluding any other function items. |
Create Databases
Databases can be created via Commands, via XQuery, in the Graphical User Interface, and through various APIs. Specifying an initial input as part of the create operation is faster, because the resources are then added to the database in a single bulk operation:
- Command-Line Interface:
CREATE DB documents /path/to/resources: Adds the resources at the specified path to a database nameddocuments. - Graphical User Interface: Go to Database → New, press Browse… to choose an initial file or directory, and press OK.
The database name must consist of a restricted set of characters (see Valid Names). Various Parsers can be selected to customize the import process, or to convert data from other input formats to XML.
Access Resources
Stored resources and external documents can be accessed in different ways:
XML Documents
Various XQuery functions exist to access XML documents in databases:
| Function | Example | Description |
|---|---|---|
db:get |
db:get("db", "path/to/docs") |
Returns all documents located at path/to/docs in the database db. If the path argument is omitted, all documents of the database are returned. |
fn:collection |
collection("db/path/to/docs") |
Returns all documents located at path/to/docs in the database db. If no path is specified after the name of the database, all documents of the database are returned. If a database has been opened in the global context, and if no argument is specified, all documents of that database are returned. |
fn:doc |
doc("db/path/to/doc.xml") |
Returns the document located at path/to/docs in the database db. An error is raised if the path addresses zero documents, or more than one. |
You can access multiple databases in a single query:
for $i in 1 to 100
return db:get('books' || $i)//book/title
If DEFAULTDB is enabled, the path argument of fn:doc and fn:collection will first be interpreted as database path and resolved against the globally opened database.
Two more functions are available for retrieving information about database nodes:
| Function | Example | Description |
|---|---|---|
db:name |
db:name($node) |
Returns the name of the database in which the specified $node is stored. |
db:path |
db:path($node) |
Returns the path of the database document in which the specified $node is stored. |
The fn:document-uri and fn:base-uri functions return URIs that can also be reused as arguments for the fn:doc and fn:collection functions. As a result, the following example query always returns true:
every $c in collection('anyDB')
satisfies doc-available(document-uri($c))
If the argument of fn:doc or fn:collection does not start with a valid database name, or if the addressed database does not exist, the string is interpreted as a URI reference, and the documents found at that location are returned. Examples:
Retrieves the addressed URI and returns it as a main-memory document node:
doc("http://web.de")
Retrieves the given file from the file system and returns it as a main-memory document node. Note that updates to main-memory nodes are not automatically written back to disk unless the WRITEBACK option is set:
doc("myfile.xml")
Returns a main-memory collection with all XML documents found at the addressed file path:
collection("/path/to/docs")
If WITHDB is disabled, fn:doc and fn:collection are never resolved against databases. Disabling this option is recommended if you always use db:get to access databases.
Binary Data
The BINARY GET command and the db:get-binary function can be used to return files in their native byte representation.
If the API you use does not support binary output (as is the case, for example, with various Client language bindings), you can convert your binary data to its string representation before returning it to the client:
string(db:get-binary('multimedia', 'sample.avi'))
XQuery Values
With db:get-value, XQuery values can be retrieved. In the following example, we assume that an XQuery map cities was stored in an indexes database:
let $city-map := db:get-value('indexes', 'cities')
return $city-map?Chile
Update Resources
Commands
Once you have created a database, additional commands exist to modify its contents:
- XML documents can be added with the
PUTandADDcommands. - Binary data is stored with
BINARY PUT. - Resources of all types can be deleted via
DELETE.
AUTOFLUSH can be turned off before bulk operations (i.e., before numerous new resources are added to the database).
If ADDCACHE is enabled, the input will be cached before it is added to the database. This is helpful when the input documents are expected to consume a large amount of main memory.
With the following command script, an empty database is created, two resources are added (one directly, another one cached), and all data is exported to the file system:
CREATE DB example
SET AUTOFLUSH false
ADD example.xml
SET ADDCACHE true
ADD /path/to/xml/documents
BINARY PUT TO images/ 123.jpg
EXPORT /path/to/file-system/
XQuery
You can also use the Database Functions to add, replace, or delete XML documents:
db:add('documents', '/path/to/xml/resources/')
Other function modules, such as the File Functions, can be used to filter the input. In the following code, all files whose names contain digits are selected and stored as XML. If an input file does not contain well-formed XML, it is stored as a binary resource, and the error message is stored as a string value:
let $db := 'documents'
let $root := '/path/to/resources/'
for $path in file:list($root)
where matches($path, '\d+')
return try {
db:put($db, fetch:doc($root || $path), $path)
} catch * {
db:put-binary($db, $root || $path, $path),
db:put-value($db, $err:description, $path || '.error')
}
The error messages can then be analyzed in a second step, for example:
let $errors := db:get-value('documents')
for $filename in map:keys($errors)
where ends-with($filename, '.error')
return $filename || ': ' || $errors?($filename)
Export Database
All resources stored in a database can be exported, i.e., written back to disk, e.g., as follows:
- Commands:
EXPORTwrites all resources to the specified target directory. - GUI: Go to Database → Export, choose the target directory and press OK.
- XQuery: Use
db:export.
Main-Memory Databases
A database can be created in main memory by enabling the MAINMEM option. In the standalone context, a main-memory database can then be created and accessed by subsequent commands.
If a BaseX server is started with a database created at startup — for example, by combining the -c command-line option with a CREATE DB call — BaseX clients can then access and update that database:
# Server
basexserver -c"SET mainmem on" -c"CREATE DB mainmem document.xml"
BaseX [Server]
Server was started (port: 1984).
MAINMEM: true
Database 'mainmem' created in 1782.80 ms.
# Client
basexclient
Username: ...
Password: ...
BaseX [Client]
Try 'help' to get more information.
> XQUERY count(db:get('mainmem')//*)
1876462
Query executed in 0.97 ms.
Additional notes:
- You can force an ordinary database, or parts of it, to be temporarily copied into memory by applying an empty main-memory update to a database node:
db:get('some-db') update { } - If you open local or remote documents with
fn:docorfn:collection, the resulting internal representation is identical to that of a main-memory database instance (regardless of the value set forMAINMEM).
Changelog
Version 10.0- Added: New resource type for XQuery values.
- Updated: Items of binary type can be output without specifying the obsolete
rawserialization method.
- Updated:
fn:document-uriandfn:base-urinow return strings that can be reused withfn:docorfn:collectionto reopen the original document.