Difference between revisions of "Transaction Management"

From BaseX Documentation
Jump to navigation Jump to search
m (Fix typo)
 
(49 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
This article is part of the [[Advanced User's Guide]].
 
This article is part of the [[Advanced User's Guide]].
The BaseX client-server architecture offers ACID safe transactions,
+
The BaseX client-server architecture offers ACID-safe transactions,
 
with multiple readers and writers. Here is some more
 
with multiple readers and writers. Here is some more
 
information about the transaction management.
 
information about the transaction management.
  
=Transaction=
+
=Introduction=
  
 
In a nutshell, a transaction is equal to a command or query. So each command or query sent to the server becomes a transaction.
 
In a nutshell, a transaction is equal to a command or query. So each command or query sent to the server becomes a transaction.
  
Incoming requests are parsed and checked for errors on the server. If the command or query is not correct, the request will not be executed,
+
Incoming requests are parsed and checked for errors on the server. If the command or query is not correct, the request will not be executed, and the user will receive an error message. Otherwise the request becomes a transaction and gets into the transaction monitor.
and the user will receive an error message. Otherwise the request becomes a transaction and gets into the transaction monitor.
 
  
Note:
+
Please note that:
An unexpected abort of the server during a transaction, caused by a hardware
 
failure or power cut, may lead to an inconsistent database state if a transaction was active at the shutdown time. So we advise to use
 
the [[Commands#CREATE BACKUP|BACKUP]] command to backup your database regularly. If the worst case occurs, you can try the [[Commands#INSPECT|INSPECT]] command to check if your database has obvious inconsistencies, and [[Commands#RESTORE|RESTORE]] to restore a previous version of the database.
 
  
==Update Transactions==
+
* Locks ''cannot be synchronized'' across BaseX instances that run in different JVMs. If concurrent write operations are to be performed, we generally recommend working with the client/server or the HTTP architecture .
 +
* An ''unexpected abort'' of the server during a transaction, caused by a hardware failure or power cut, may lead to an inconsistent database state if a transaction was active at shutdown time. So it is advisable to use the [[Commands#CREATE BACKUP|BACKUP]] command to regularly backup your database. If the worst case occurs, you can try the [[Commands#INSPECT|INSPECT]] command to check if your database has obvious inconsistencies, and use [[Commands#RESTORE|RESTORE]] to restore the last backed up version of the database.
 +
 
 +
==XQuery Update==
  
 
Many update operations are triggered by [[Update|XQuery Update]] expressions. When executing an updating query, all update operations of the query are stored in a pending update list. They will be executed all at once, so the database is updated atomically. If any of the update sub-operations is erroneous, the overall transaction will be aborted.
 
Many update operations are triggered by [[Update|XQuery Update]] expressions. When executing an updating query, all update operations of the query are stored in a pending update list. They will be executed all at once, so the database is updated atomically. If any of the update sub-operations is erroneous, the overall transaction will be aborted.
Line 22: Line 21:
 
=Concurrency Control=
 
=Concurrency Control=
  
BaseX provides locking on database level. Writing transactions do not necessarily block all other transactions any more. The number of parallel transactions can be limited by setting the [[Options#PARALLEL|PARALLEL]] option.
+
BaseX provides support for multiple read and single write operations (using preclaiming and starvation-free two phase locking). This means that:
  
==Transaction Monitor==
+
* Read transactions are executed in parallel.
 +
* If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
 +
* Subsequent operations (read or write) will be queued until the updating transaction has completed.
 +
* Jobs without database access will never be locked. Globally locking jobs can now be executed in parallel with non-locking jobs.
 +
* Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated, which databases will be accessed by a transaction (see [[#Limitations|below]]).
 +
* The number of maximum parallel transactions can be adjusted with the {{Option|PARALLEL}} option.
 +
* By default, read transactions are favored, and transactions that access no databases can be evaluated even if the transactions limit has been reached. This behavior can be changed via the {{Option|FAIRLOCK}} option.
  
The transaction monitor ensures that just one writing transaction or an arbitrary amount of reading transactions ''per database'' are active at the same time.
+
==Limitations==
  
Deadlocks are prevented by using preclaiming two phase locking. Execution is starvation-free as lock acquisition is queued per database. Due to the specifics of XQuery Update, all updates are written at the end of the query. Locking is strict with the exception that databases for which BaseX recognizes it will not write to are downgraded to read locks.
+
===Commands===
  
Locks are not synchronized between multiple BaseX instances. We generally recommend working with the client/server architecture if concurrent write operations are to be performed.
+
Database locking works with all commands unless the glob syntax is used, such as in the following command call:
  
==External Side Effects==
+
* {{Code|DROP DB new*}}: drop all databases starting with "new"
  
Access to external resources (files on hard disk, HTTP requests, ...) is not controlled by BaseX' transaction monitor unless specified by the user.
+
===XQuery===
  
===XQuery Locking Options===
+
Deciding which databases will be accessed by a complex XQuery expression is a non-trivial task. Database detection works for the following types of queries:
  
Custom locks can be acquired by setting the BaseX-specific XQuery options {{Code|query:read-lock}} and {{Code|query:write-lock}}. Multiple option declarations may occur in the prolog of a query, but multiple values can also be separated with commas in a single declaration. These locks are in another namespace than the database names: the lock value {{Code|factbook}} will not lock a database named factbook.
+
* {{Code|//item}}, read-locking of the database opened by a client
 +
* {{Code|doc('factbook')}}, read-locking of "factbook"
 +
* {{Code|collection('db/path/to/docs')}}, read-locking of "db"
 +
* {{Code|delete nodes db:open('test')//*[string-length(local-name(.)) > 5]}}, write-locking of "test"
 +
* {{Code|fn:sum(1 to 100)}} (no lock)
  
These option declarations will put read locks on ''foo'', ''bar'' and ''batz'' and a write lock on ''quix'':
+
A global lock will be assigned if the name of the database is not a static string:
  
<pre class="brush:xquery">
+
* {{Code|for $db in ('db1', 'db2') return db:open($db)}}
declare option query:read-lock "foo,bar";
+
* {{Code|doc(doc('test')/reference/text())}}
declare option query:read-lock "batz";
+
* <code>let $db := 'test' return insert nodes <test/> into db:open($db)</code>
declare option query:write-lock "quix";
 
</pre>
 
  
===Java Modules===
+
The functions [[Databases#XML Documents|fn:doc]] and [[Databases#XML Documents|fn:collection]] can also be used to address that are not stored in a database. However, this may lead to unwanted locks, and you have two options to reduce the number of locks: No database lookups will take place if {{Option|WITHDB}} option is disabled, or if {{Function|Fetch|fetch:xml}} is used instead of [[Databases#XML Documents|fn:doc]].
  
Locks can also be acquired on [[Java Bindings#Locking|Java functions]] which are imported and invoked from an XQuery expression. It is advisable to explicitly lock Java code whenever it performs sensitive read and write operations.
+
You can consult the query info output (which you find in the [[GUI#Visualizations|Info View]] of the GUI or which you can turn on by setting {{Option|QUERYINFO}} to {{Code|true}}) to find out which databases have been locked by a query.
 +
 
 +
=XQuery Locks=
 +
 
 +
By default, access to external resources (files on hard disk, HTTP requests, ...) is not controlled by the transaction monitor of BaseX. Custom locks can be assigned via annotations, pragmas or options:
 +
 
 +
* A lock string may consist of a single key or multiple keys separated with commas.
 +
* Internal locks and XQuery locks can co-exist. No conflicts arise, even if a lock string equals the name of a database that is locked by the transaction manager.
 +
* The lock is transformed into a write lock by making the corresponding expression updating.
 +
 
 +
==Annotations==
 +
 
 +
In the following module, lock annotations are used to prevent concurrent write operations on the same file:
 +
 
 +
<syntaxhighlight lang="xquery">
 +
module namespace config = 'config';
  
==Limitations==
+
declare %basex:lock('CONFIG') function config:read() as xs:string {
 +
  file:read-text('config.txt')
 +
};
  
===Commands===
+
declare %updating %basex:lock('CONFIG') function config:write($data as xs:string) {
 +
  file:write-text('config.txt', $data)
 +
};
 +
</syntaxhighlight>
  
Database locking works with all commands unless no glob syntax is used, such as in the following command call:
+
Some explanations:
  
* {{Code|DROP DB new*}}: drop all databases starting with "new"
+
* If a query calls the <code>config:read</code> function, a read lock will be acquired for the user-defined {{Code|CONFIG}} lock string before query evaluation.
 +
* If <code>config:write</code> is called by a query, a write lock will be applied.
 +
* If another query calls <code>config:write</code>, it will be queued until the first query is evaluated.
  
===XQuery===
+
==Pragmas==
  
As XQuery is a very powerful language, deciding which databases will be accessed by a query is non-trivial. Optimization is work in progress.
+
Locks can also be declared via pragmas:
The current identification of which databases to lock is limited to queries that access the currently opened database, XQuery functions that explicitly specify a database, and expressions that address no database at all.
 
  
Some examples on database-locking enabled queries, all of these can be executed in parallel:
+
<syntaxhighlight lang="xquery">
 +
update:output((# basex:lock CONFIG #) {
 +
  file:write('config.xml', <config/>)
 +
})
 +
</syntaxhighlight>
  
* {{Code|//item}}, read-locking of the database opened by a client
+
The write locks is enforced via the {{Code|Update|update:output}}.
* {{Code|doc('factbook')}}, read-locking of "factbook"
 
* {{Code|collection('db/path/to/docs')}}, read-locking of "db"
 
* {{Code|fn:sum(1 to 100)}}, locking nothing at all
 
* {{Code|delete nodes doc('test')//*[string-length(local-name(.)) > 5]}}, write-locking of "test"
 
  
Some examples on queries that are not supported by database-locking yet:
+
==Options==
  
* <code>let $db := 'factbook' return doc($db)</code>, will read-lock: referencing database names isn’t supported yet
+
Locks for the functions of a module can also be assigned via option declarations:
* {{Code|for $db in ('factbook') return doc($db)}}, will read-lock globally
 
* {{Code|doc(doc('test')/reference/text())}}, will read-lock globally
 
* <code>let $db := 'test' return insert nodes <test/> into doc($db)</code>, will write-lock globally
 
  
A list of all locked databases is output if <code>[[Options#QUERYINFO|QUERYINFO]]</code> is set to {{Code|true}}. <!-- and in the GUI's [[GUI#Visualizations|Info View]] --> If you think that too much is locked, please give us a note on our [http://basex.org/open-source/ mailing list] with some example code.
+
<syntaxhighlight lang="xquery">
 +
declare option basex:lock 'CONFIG';
  
===GUI===
+
update:output(file:write('config.xml', <config/>))
 +
</syntaxhighlight>
  
Database locking is currently disabled if the BaseX GUI is used.
+
Once again, a write lock is enforced.
  
==Process Locking==
+
==Java Modules==
  
In order to enable locking on global (process) level, the option <code>[[Options#GLOBALLOCK|GLOBALLOCK]]</code> can be set to {{Code|true}}. This can e.g. be done by editing your {{Code|.basex}} file (see [[Options]] for more details). If process locking is active, a process that performs write operations will queue all other operations.
+
Locks can also be acquired on [[Java Bindings#Locking|Java functions]] which are imported and invoked from an XQuery expression. It is advisable to explicitly lock Java code whenever it performs sensitive read and write operations.
  
 
=File-System Locks=
 
=File-System Locks=
Line 94: Line 121:
 
==Update Operations==
 
==Update Operations==
  
During the term of a database update, a locking file {{Code|upd.basex}} will reside in that database directory. If the update fails for some unexpected reason, or if the process is killed ungracefully, this file may not be deleted. In this case, the database cannot be opened anymore using the default commands, and the message "Database ... is being updated, or update was not completed" will be shown instead. If the locking file is manually removed, you may be able to reopen the database, but you should be aware that database may have got corrupt due to the interrupted update process, and you should revert to the most recent database backup.
+
During a database update, a locking file {{Code|upd.basex}} will reside in that database directory. If the update fails for some unexpected reason, or if the process is killed ungracefully, this file will not be deleted. In this case, the database cannot be opened anymore, and the message "Database ... is being updated, or update was not completed" will be shown instead.  
 +
 
 +
If the locking file is manually removed, you may be able to reopen the database, but you should be aware that database may have got corrupt due to the interrupted update process, and you should revert to the most recent database backup.
  
 
==Database Locks==
 
==Database Locks==
  
To avoid database corruptions caused by write operations running in different JVMs, a shared lock is requested on the database table file ({{Code|tbl.basex}}) whenever a database is opened. If an update operation is triggered, it will be rejected with the message "Database ... is opened by another process." if no exclusive lock can be acquired.
+
To avoid database corruptions that are caused by accidental write operations from different JVMs, a shared lock is requested on the database table file ({{Code|tbl.basex}}) whenever a database is opened. If an update operation is triggered, and if no exclusive lock can be acquired, it will be rejected with the message "Database ... is currently opened by another process.".
  
As the standalone versions of BaseX (command-line, GUI) cannot be synchronized with other BaseX instances, we generally recommend working with the client/server architecture if concurrent write operations are to be performed.
+
Please note that you cannot 100% rely on this mechanism, as it is not possible to synchronize operations across different JVMs. You will be safe when using the client/server or HTTP architecture.
  
 
=Changelog=
 
=Changelog=
 +
 +
;Version 9.4
 +
* Updated: Single lock option for reads and writes.
 +
 +
;Version 9.1
 +
* Updated: Query lock options were moved from {{Code|query}} to {{Code|basex}} namespace.
 +
 +
;Version 8.6
 +
* Updated: New {{Option|FAIRLOCK}} option, improved detection of lock patterns.
  
 
;Version 7.8
 
;Version 7.8
 
 
* Added: Locks can also be acquired on [[Java Bindings#Locking|Java functions]].
 
* Added: Locks can also be acquired on [[Java Bindings#Locking|Java functions]].
  
 
;Version 7.6
 
;Version 7.6
 
 
* Added: database locking introduced, replacing process locking.
 
* Added: database locking introduced, replacing process locking.
  
 
;Version 7.2.1
 
;Version 7.2.1
 
 
* Updated: pin files replaced with shared/exclusive filesystem locking.
 
* Updated: pin files replaced with shared/exclusive filesystem locking.
  
 
;Version 7.2
 
;Version 7.2
 
 
* Added: pin files to mark open databases.
 
* Added: pin files to mark open databases.
  
 
;Version 7.1
 
;Version 7.1
 
 
* Added: update lock files.
 
* Added: update lock files.
 
[[Category:Server]]
 
[[Category:Internals]]
 

Latest revision as of 09:19, 10 February 2021

This article is part of the Advanced User's Guide. The BaseX client-server architecture offers ACID-safe transactions, with multiple readers and writers. Here is some more information about the transaction management.

Introduction[edit]

In a nutshell, a transaction is equal to a command or query. So each command or query sent to the server becomes a transaction.

Incoming requests are parsed and checked for errors on the server. If the command or query is not correct, the request will not be executed, and the user will receive an error message. Otherwise the request becomes a transaction and gets into the transaction monitor.

Please note that:

  • Locks cannot be synchronized across BaseX instances that run in different JVMs. If concurrent write operations are to be performed, we generally recommend working with the client/server or the HTTP architecture .
  • An unexpected abort of the server during a transaction, caused by a hardware failure or power cut, may lead to an inconsistent database state if a transaction was active at shutdown time. So it is advisable to use the BACKUP command to regularly backup your database. If the worst case occurs, you can try the INSPECT command to check if your database has obvious inconsistencies, and use RESTORE to restore the last backed up version of the database.

XQuery Update[edit]

Many update operations are triggered by XQuery Update expressions. When executing an updating query, all update operations of the query are stored in a pending update list. They will be executed all at once, so the database is updated atomically. If any of the update sub-operations is erroneous, the overall transaction will be aborted.

Concurrency Control[edit]

BaseX provides support for multiple read and single write operations (using preclaiming and starvation-free two phase locking). This means that:

  • Read transactions are executed in parallel.
  • If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
  • Subsequent operations (read or write) will be queued until the updating transaction has completed.
  • Jobs without database access will never be locked. Globally locking jobs can now be executed in parallel with non-locking jobs.
  • Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated, which databases will be accessed by a transaction (see below).
  • The number of maximum parallel transactions can be adjusted with the PARALLEL option.
  • By default, read transactions are favored, and transactions that access no databases can be evaluated even if the transactions limit has been reached. This behavior can be changed via the FAIRLOCK option.

Limitations[edit]

Commands[edit]

Database locking works with all commands unless the glob syntax is used, such as in the following command call:

  • DROP DB new*: drop all databases starting with "new"

XQuery[edit]

Deciding which databases will be accessed by a complex XQuery expression is a non-trivial task. Database detection works for the following types of queries:

  • //item, read-locking of the database opened by a client
  • doc('factbook'), read-locking of "factbook"
  • collection('db/path/to/docs'), read-locking of "db"
  • delete nodes db:open('test')//*[string-length(local-name(.)) > 5], write-locking of "test"
  • fn:sum(1 to 100) (no lock)

A global lock will be assigned if the name of the database is not a static string:

  • for $db in ('db1', 'db2') return db:open($db)
  • doc(doc('test')/reference/text())
  • let $db := 'test' return insert nodes <test/> into db:open($db)

The functions fn:doc and fn:collection can also be used to address that are not stored in a database. However, this may lead to unwanted locks, and you have two options to reduce the number of locks: No database lookups will take place if WITHDB option is disabled, or if fetch:xml is used instead of fn:doc.

You can consult the query info output (which you find in the Info View of the GUI or which you can turn on by setting QUERYINFO to true) to find out which databases have been locked by a query.

XQuery Locks[edit]

By default, access to external resources (files on hard disk, HTTP requests, ...) is not controlled by the transaction monitor of BaseX. Custom locks can be assigned via annotations, pragmas or options:

  • A lock string may consist of a single key or multiple keys separated with commas.
  • Internal locks and XQuery locks can co-exist. No conflicts arise, even if a lock string equals the name of a database that is locked by the transaction manager.
  • The lock is transformed into a write lock by making the corresponding expression updating.

Annotations[edit]

In the following module, lock annotations are used to prevent concurrent write operations on the same file:

module namespace config = 'config';

declare %basex:lock('CONFIG') function config:read() as xs:string {
  file:read-text('config.txt')
};

declare %updating %basex:lock('CONFIG') function config:write($data as xs:string) {
  file:write-text('config.txt', $data)
};

Some explanations:

  • If a query calls the config:read function, a read lock will be acquired for the user-defined CONFIG lock string before query evaluation.
  • If config:write is called by a query, a write lock will be applied.
  • If another query calls config:write, it will be queued until the first query is evaluated.

Pragmas[edit]

Locks can also be declared via pragmas:

update:output((# basex:lock CONFIG #) {
  file:write('config.xml', <config/>)
})

The write locks is enforced via the Update.

Options[edit]

Locks for the functions of a module can also be assigned via option declarations:

declare option basex:lock 'CONFIG';

update:output(file:write('config.xml', <config/>))

Once again, a write lock is enforced.

Java Modules[edit]

Locks can also be acquired on Java functions which are imported and invoked from an XQuery expression. It is advisable to explicitly lock Java code whenever it performs sensitive read and write operations.

File-System Locks[edit]

Update Operations[edit]

During a database update, a locking file upd.basex will reside in that database directory. If the update fails for some unexpected reason, or if the process is killed ungracefully, this file will not be deleted. In this case, the database cannot be opened anymore, and the message "Database ... is being updated, or update was not completed" will be shown instead.

If the locking file is manually removed, you may be able to reopen the database, but you should be aware that database may have got corrupt due to the interrupted update process, and you should revert to the most recent database backup.

Database Locks[edit]

To avoid database corruptions that are caused by accidental write operations from different JVMs, a shared lock is requested on the database table file (tbl.basex) whenever a database is opened. If an update operation is triggered, and if no exclusive lock can be acquired, it will be rejected with the message "Database ... is currently opened by another process.".

Please note that you cannot 100% rely on this mechanism, as it is not possible to synchronize operations across different JVMs. You will be safe when using the client/server or HTTP architecture.

Changelog[edit]

Version 9.4
  • Updated: Single lock option for reads and writes.
Version 9.1
  • Updated: Query lock options were moved from query to basex namespace.
Version 8.6
  • Updated: New FAIRLOCK option, improved detection of lock patterns.
Version 7.8
Version 7.6
  • Added: database locking introduced, replacing process locking.
Version 7.2.1
  • Updated: pin files replaced with shared/exclusive filesystem locking.
Version 7.2
  • Added: pin files to mark open databases.
Version 7.1
  • Added: update lock files.