Difference between revisions of "Transaction Management"

From BaseX Documentation
Jump to navigation Jump to search
(7 intermediate revisions by the same user not shown)
Line 26: Line 26:
 
* If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
 
* If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
 
* Subsequent operations (read or write) will be queued until the updating transaction has completed.
 
* Subsequent operations (read or write) will be queued until the updating transaction has completed.
 +
* Jobs without database access will never be locked. Globally locking jobs can now be executed in parallel with non-locking jobs.
 +
* Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated, which databases will be accessed by a transaction (see [[#Limitations|below]]).
 +
* The number of maximum parallel transactions can be adjusted with the {{Option|PARALLEL}} option.
 +
* By default, read transactions are favored, and transactions that access no databases can be evaluated even if the transactions limit has been reached. This behavior can be changed via the {{Option|FAIRLOCK}} option.
  
Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated) which databases will be accessed by a transaction (see [[#Limitations|below]]). The number of maximum parallel transactions can be adjusted with the [[Options#PARALLEL|PARALLEL]] option.
+
==XQuery Locks==
  
With {{Version|8.6}}, locking has been improved:
+
By default, access to external resources (files on hard disk, HTTP requests, ...) is not controlled by the transaction monitor of BaseX. You can use custom XQuery locks to do so:
  
* A {{Option|FAIRLOCK}} option has been added: By default, read transactions will now be favored, and transactions that access no databases can be evaluated even if the maximum limit has been reached.
+
===Query Options===
* A globally locking job can now be executed in parallel with a non-locking job.
 
  
==External Side Effects==
+
* You can declare custom locks via the {{Code|query:read-lock}} and {{Code|query:write-lock}} options in the query prolog.
 +
* The value of the option contains the lock string, or multiple ones (separated with commas).
 +
* Similar to the internal database locks, write locks block all other operations while read locks allow parallel access.
 +
* The internal locks and XQuery locks can co-exist (there will be no conflicts, even if your lock string equals the name of a database that will be locked by the transaction manager).
  
Access to external resources (files on hard disk, HTTP requests, ...) is not controlled by the transaction monitor of BaseX unless specified by the user.
+
In the following two example modules, locks have been added to prevent concurrent write operations on the same file:
  
===XQuery Locking Options===
+
<pre class="brush:xquery">
 +
module namespace read = 'read';
  
Custom locks can be acquired by setting the BaseX-specific XQuery options {{Code|query:read-lock}} and {{Code|query:write-lock}}. Multiple option declarations may occur in the prolog of a query, but multiple values can also be separated with commas in a single declaration. These locks are in another namespace than the database names: the lock value {{Code|factbook}} will not lock a database named factbook.
+
(:~ Read lock on CONFIG key. :)
 +
declare option query:read-lock 'CONFIG';
  
These option declarations will put read locks on ''foo'', ''bar'' and ''batz'' and a write lock on ''quix'':
+
declare function read:config() {
 +
  file:read-text('config.txt')
 +
};
 +
</pre>
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
declare option query:read-lock "foo,bar";
+
module namespace write = 'write';
declare option query:read-lock "batz";
+
 
declare option query:write-lock "quix";
+
(:~ Write lock on CONFIG key. :)
 +
declare option query:write-lock 'CONFIG';
 +
 
 +
declare function write:file($data) {
 +
  file:write-text('config.txt', $data)
 +
};
 
</pre>
 
</pre>
 +
 +
Some explanations:
 +
 +
* If a query is parsed that is going to call the <code>read:file</code> function, a read lock will be acquired for the user-defined {{Code|CONFIG}} lock string before query evaluation.
 +
* If <code>write:file</code> is referenced by a query, a write lock on this lock string will be set for this query.
 +
* If a query references <code>write:file</code>, it will be queued until there is no running query left that has {{Code|files}} locked.
 +
* If the writing query will be evaluated, all other queries that will set a {{Code|files}} lock (reading or writing) will have to wait.
 +
 +
In practice, it’s often sufficient to only work with (exclusive) write locks.
  
 
===Java Modules===
 
===Java Modules===

Revision as of 16:21, 26 October 2017

This article is part of the Advanced User's Guide. The BaseX client-server architecture offers ACID-safe transactions, with multiple readers and writers. Here is some more information about the transaction management.

Introduction

In a nutshell, a transaction is equal to a command or query. So each command or query sent to the server becomes a transaction.

Incoming requests are parsed and checked for errors on the server. If the command or query is not correct, the request will not be executed, and the user will receive an error message. Otherwise the request becomes a transaction and gets into the transaction monitor.

Please note that:

  • Locks cannot be synchronized across BaseX instances that run in different JVMs. If concurrent write operations are to be performed, we generally recommend working with the client/server or the HTTP architecture .
  • An unexpected abort of the server during a transaction, caused by a hardware failure or power cut, may lead to an inconsistent database state if a transaction was active at shutdown time. So it is advisable to use the BACKUP command to regularly backup your database. If the worst case occurs, you can try the INSPECT command to check if your database has obvious inconsistencies, and use RESTORE to restore the last backed up version of the database.

XQuery Update

Many update operations are triggered by XQuery Update expressions. When executing an updating query, all update operations of the query are stored in a pending update list. They will be executed all at once, so the database is updated atomically. If any of the update sub-operations is erroneous, the overall transaction will be aborted.

Concurrency Control

BaseX provides support for multiple read and single write operations (using preclaiming and starvation-free two phase locking). This means that:

  • Read transactions are executed in parallel.
  • If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
  • Subsequent operations (read or write) will be queued until the updating transaction has completed.
  • Jobs without database access will never be locked. Globally locking jobs can now be executed in parallel with non-locking jobs.
  • Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated, which databases will be accessed by a transaction (see below).
  • The number of maximum parallel transactions can be adjusted with the PARALLEL option.
  • By default, read transactions are favored, and transactions that access no databases can be evaluated even if the transactions limit has been reached. This behavior can be changed via the FAIRLOCK option.

XQuery Locks

By default, access to external resources (files on hard disk, HTTP requests, ...) is not controlled by the transaction monitor of BaseX. You can use custom XQuery locks to do so:

Query Options

  • You can declare custom locks via the query:read-lock and query:write-lock options in the query prolog.
  • The value of the option contains the lock string, or multiple ones (separated with commas).
  • Similar to the internal database locks, write locks block all other operations while read locks allow parallel access.
  • The internal locks and XQuery locks can co-exist (there will be no conflicts, even if your lock string equals the name of a database that will be locked by the transaction manager).

In the following two example modules, locks have been added to prevent concurrent write operations on the same file:

module namespace read = 'read';

(:~ Read lock on CONFIG key. :)
declare option query:read-lock 'CONFIG';

declare function read:config() {
  file:read-text('config.txt')
};
module namespace write = 'write';

(:~ Write lock on CONFIG key. :)
declare option query:write-lock 'CONFIG';

declare function write:file($data) {
  file:write-text('config.txt', $data)
};

Some explanations:

  • If a query is parsed that is going to call the read:file function, a read lock will be acquired for the user-defined CONFIG lock string before query evaluation.
  • If write:file is referenced by a query, a write lock on this lock string will be set for this query.
  • If a query references write:file, it will be queued until there is no running query left that has files locked.
  • If the writing query will be evaluated, all other queries that will set a files lock (reading or writing) will have to wait.

In practice, it’s often sufficient to only work with (exclusive) write locks.

Java Modules

Locks can also be acquired on Java functions which are imported and invoked from an XQuery expression. It is advisable to explicitly lock Java code whenever it performs sensitive read and write operations.

Limitations

Commands

Database locking works with all commands unless the glob syntax is used, such as in the following command call:

  • DROP DB new*: drop all databases starting with "new"

XQuery

Deciding which databases will be accessed by a complex XQuery expression is a non-trivial task. Database detection works for the following types of queries:

  • //item, read-locking of the database opened by a client
  • doc('factbook'), read-locking of "factbook"
  • collection('db/path/to/docs'), read-locking of "db"
  • fn:sum(1 to 100), locking nothing at all
  • delete nodes doc('test')//*[string-length(local-name(.)) > 5], write-locking of "test"

All databases will be locked by queries of the following kind:

  • for $db in ('db1', 'db2') return doc($db)
  • doc(doc('test')/reference/text())
  • let $db := 'test' return insert nodes <test/> into doc($db)

You can consult the query info output (which you find in the Info View of the GUI or which you can turn on by setting QUERYINFO to true) to find out which databases have been locked by a query.

File-System Locks

Update Operations

During a database update, a locking file upd.basex will reside in that database directory. If the update fails for some unexpected reason, or if the process is killed ungracefully, this file will not be deleted. In this case, the database cannot be opened anymore, and the message "Database ... is being updated, or update was not completed" will be shown instead.

If the locking file is manually removed, you may be able to reopen the database, but you should be aware that database may have got corrupt due to the interrupted update process, and you should revert to the most recent database backup.

Database Locks

To avoid database corruptions that are caused by accidental write operations from different JVMs, a shared lock is requested on the database table file (tbl.basex) whenever a database is opened. If an update operation is triggered, and if no exclusive lock can be acquired, it will be rejected with the message "Database ... is currently opened by another process.".

Please note that you cannot 100% rely on this mechanism, as it is not possible to synchronize operations across different JVMs. You will be safe when using the client/server or HTTP architecture.

Changelog

Version 8.6
  • Updated: New FAIRLOCK option, improved detection of lock patterns.
Version 7.8
Version 7.6
  • Added: database locking introduced, replacing process locking.
Version 7.2.1
  • Updated: pin files replaced with shared/exclusive filesystem locking.
Version 7.2
  • Added: pin files to mark open databases.
Version 7.1
  • Added: update lock files.