Difference between revisions of "Transaction Management"

From BaseX Documentation
Jump to navigation Jump to search
m (Text replacement - "<syntaxhighlight lang="xquery">" to "<pre lang='xquery'>")
 
(147 intermediate revisions by 9 users not shown)
Line 1: Line 1:
==ACID Properties==
+
This article is part of the [[Advanced User's Guide]].
 +
The BaseX client-server architecture offers ACID-safe transactions,
 +
with multiple readers and writers. Here is some more
 +
information about the transaction management.
  
*Atomicity
+
=Introduction=
*Consitency
 
*Isolation
 
*Durability
 
  
==Transaction processing==
+
In a nutshell, a transaction is equal to a command or query. So each command or query sent to the server becomes a transaction.
  
Incoming requests are parsed and checked for errors on the server.
+
Incoming requests are parsed and checked for errors on the server. If the command or query is not correct, the request will not be executed, and the user will receive an error message. Otherwise the request becomes a transaction and gets into the transaction monitor.
If the query is not correct, the transaction will not be executed,
 
and the user will recieve an error message
 
  
When executing a transaction, all updates are stored in an update
+
Please note that:
list. They will be executed all at once, so the database is
 
updated atomically. If any of the update subtransactions is
 
erroneous the overall transaction will be aborted.
 
  
The concurrency control checks for each transaction, which will
+
* Locks ''cannot be synchronized'' across BaseX instances that run in different JVMs. If concurrent write operations are to be performed, we generally recommend working with the client/server or the HTTP architecture .
perform a read or write operation on the database, the status of
+
* An ''unexpected abort'' of the server during a transaction, caused by a hardware failure or power cut, may lead to an inconsistent database state if a transaction was active at shutdown time. It is advisable to use the {{Command|CREATE BACKUP}} command to regularly back up your database. If the worst case occurs, you can try the {{Command|INSPECT}} command to check if your database has obvious inconsistencies, and use {{Command|RESTORE}} to restore the last backed up version of the database.
the lock object and decides whether the isolation is guaranteed
 
for that transaction.  
 
If this is the case, the transaction will be started immediately.
 
Otherwise, the transaction enters a waiting mode.
 
  
 +
==XQuery Update==
  
With the introduction of XQuery Update the complexity of the
+
Many update operations are triggered by [[Update|XQuery Update]] expressions. When executing an updating query, all update operations of the query are stored in a pending update list. They will be executed all at once, so the database is updated atomically. If any of the update sub-operations is erroneous, the overall transaction will be aborted.
update operations increased, so now it's possible to address
 
numerous databases and execute updates on them.  
 
Updates, which uses multiple databases can not be executed
 
with a simple data lock. That's why BaseX uses a special lock
 
object, which controls the execution of the server process.
 
This has the advantage that the used databases need not to be
 
known and the correct execution is still granted. The
 
disadvantage is that write operations are executed sequentially.
 
Read-only operations are executed in parallel.
 
  
For these reasons, a waiting list is used, which ensures that all processes are
+
=Concurrency Control=
treated equally and that they are executed accordingly to their sequence.
 
This corresponds to the FIFO principle ('First-In First-Out'), which states that
 
the first processes that arrives at the server, will be the first one which will
 
be executed. The FIFO principle can not be adhered in a group of reading
 
transactions, as they run in different threads and thus can overtake each other.
 
  
The use of the monitor also prevents the system from deadlocks, because the
+
BaseX provides support for multiple read and single write operations (using preclaiming and starvation-free two phase locking). This means that:
critical resource is only assigned to one writing transaction resp. a group of
 
reading transactions. So there is only one active writing transaction.
 
  
 +
* Read transactions are executed in parallel.
 +
* If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
 +
* Subsequent operations (read or write) will be queued until the updating transaction has completed.
 +
* Jobs without database access will never be locked. Globally locking jobs can now be executed in parallel with non-locking jobs.
 +
* Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated, which databases will be accessed by a transaction (see [[#Limitations|below]]).
 +
* The number of maximum parallel transactions can be adjusted with the {{Option|PARALLEL}} option.
 +
* By default, read transactions are favored, and transactions that access no databases can be evaluated even if the transactions limit has been reached. This behavior can be changed via the {{Option|FAIRLOCK}} option.
  
N.B.
+
==Limitations==
An abort of the server during a transaction will inevitably lead
 
to an inconsistent database. A rollback of the transaction would
 
prevent such an undesirable database state. Unfortunately,
 
this feature is not <b>yet</b> available in the current version of BaseX.
 
  
[[Category:Server]]
+
===Commands===
[[Category:Internal]]
+
 
[[Category:Finish]]
+
All commands come with a detector for local locks. Global locking is applied if the glob syntax is used:
 +
 
 +
* {{Code|DROP DB new*}}: Drop all databases starting with the prefix string {{Code|new}}.
 +
 
 +
===XQuery===
 +
 
 +
Since BaseX 10, the [[BaseX 10#Compilation|lock detection has been fundamentally improved]], by splitting compilation into multiple steps.
 +
 
 +
Local locks can be applied if it is possible after compile time to associate all database operations with static databases names:
 +
 
 +
{| class="wikitable"
 +
|- valign="top"
 +
! Query
 +
! Description
 +
|- valign="top"
 +
| <code>//item</code>
 +
| Read lock of the currently opened database
 +
|- valign="top"
 +
| <code>doc('factbook')</code>
 +
| Read lock of the {{Code|factbook}} database
 +
|- valign="top"
 +
| <code>collection('documents/path/to/docs')</code>
 +
| Read lock of the {{Code|documents}} database
 +
|- valign="top"
 +
| <code>delete nodes db:get('test')//*[@type = 'misc']</code>
 +
| Write lock of the {{Code|test}} database
 +
|- valign="top"
 +
| <code>declare variables $db external;<br/>db:get($db)</code>
 +
| Read lock of the database externally bound to {{Code|$db}}.
 +
|- valign="top"
 +
| <code>for $db in ('db1', 'db2')<br/>return db:get($db)</code>
 +
| Read lock of {{Code|db1}} and {{Code|db2}}, as the query is [[XQuery Optimizations#Loop Unrolling|unrolled at compile time]].
 +
|- valign="top"
 +
| <code>let $db := 'test'<br/>return insert nodes <test/> into db:get($db)</code>
 +
| Read lock of {{Code|test}}, as the [[XQuery Optimizations#Variable_Inlining|variable is inlined]] at compile time.
 +
|- valign="top"
 +
| <code>sum(1 to 100)</code>
 +
| No lock required
 +
|- valign="top"
 +
| <code>declare variable $SIMULATE := true();<br/>if($SIMULATE) then <doc/> else db:get('doc')</code>
 +
| No lock required, as the query is simplified to {{Code|<doc/>}} at compile time.
 +
|}
 +
 
 +
A global lock will be assigned if the static detection fails:
 +
 
 +
{| class="wikitable"
 +
|- valign="top"
 +
! Query
 +
! Description
 +
|- valign="top"
 +
| <code>db:get(doc('test')/reference/text())</code>
 +
| The name of the database to be opened will only be known at evaluation time.
 +
|- valign="top"
 +
| <code>(1 to 100) ! db:get(concat('db', .))</code>
 +
| The {{Option|UNROLLLIMIT}} can be increased to generate 100 {{Code|db:get}} function calls and corresponding locks.
 +
|}
 +
 
 +
The functions {{Code|fn:doc}} and {{Code|fn:collection}} can be used for both accessing databases resources and fetching resources at the specified URI (see [[Databases#Access Resources|Access Resources]] for more details). There are two ways to reduce the number of locks:
 +
 
 +
# Turn off the {{Option|WITHDB}} option to prevent the functions from accessing databases; or
 +
# use {{Function|Fetch|fetch:doc}} for fetching resources from URIs, and use {{Function|Database|db:get}} for accessing databases.
 +
 
 +
You can consult the query info output (via the [[GUI#Visualizations|Info View]] of the GUI, via {{Code|-V}} on [[Command-Line]] or via turning on the {{Option|QUERYINFO}} option) to find out which databases are locked by a query, and if local locks or a global lock is applied.
 +
 
 +
=XQuery Locks=
 +
 
 +
By default, access to external resources (files on hard disk, HTTP requests, …) is not controlled by the transaction monitor of BaseX. Custom locks can be assigned via annotations, pragmas or options:
 +
 
 +
* A lock string may consist of a single key or multiple keys separated with commas.
 +
* Internal locks and XQuery locks can co-exist. No conflicts arise, even if a lock string equals the name of a database that is locked by the transaction manager.
 +
* The lock is transformed into a write lock by making the corresponding expression updating.
 +
 
 +
==Annotations==
 +
 
 +
In the following module, lock annotations are used to prevent concurrent write operations on the same file:
 +
 
 +
<pre lang='xquery'>
 +
module namespace config = 'config';
 +
 
 +
declare %basex:lock('CONFIG') function config:read() as xs:string {
 +
  file:read-text('config.txt')
 +
};
 +
 
 +
declare %updating %basex:lock('CONFIG') function config:write($data as xs:string) {
 +
  file:write-text('config.txt', $data)
 +
};
 +
</pre>
 +
 
 +
Some explanations:
 +
 
 +
* If a query calls a reading <code>basex:lock</code> function, a read lock will be acquired for the user-defined {{Code|CONFIG}} lock string before query evaluation.
 +
* If an updating <code>basex:lock</code> is called by a query, a write lock will be applied.
 +
* If another query calls a <code>basex:lock</code> function, it will be queued until the first query is evaluated.
 +
 
 +
==Pragmas==
 +
 
 +
Locks can also be declared via pragmas:
 +
 
 +
<pre lang='xquery'>
 +
update:output((# basex:lock CONFIG #) {
 +
  file:write('config.xml', <config/>)
 +
})
 +
</pre>
 +
 
 +
The write locks is enforced via the {{Code|Update|update:output}}.
 +
 
 +
==Options==
 +
 
 +
Locks for the functions of a module can also be assigned via option declarations:
 +
 
 +
<pre lang='xquery'>
 +
declare option basex:lock 'CONFIG';
 +
 
 +
update:output(file:write('config.xml', <config/>))
 +
</pre>
 +
 
 +
Once again, a write lock is enforced.
 +
 
 +
==Java Modules==
 +
 
 +
Locks can also be acquired on [[Java Bindings#Locking|Java functions]] which are imported and invoked from an XQuery expression. It is advisable to explicitly lock Java code whenever it performs sensitive read and write operations.
 +
 
 +
=File-System Locks=
 +
 
 +
==Update Operations==
 +
 
 +
During a database update, a locking file {{Code|upd.basex}} will reside in that database directory. If the update fails for some unexpected reason, or if the process is killed ungracefully, this file will not be deleted. In this case, the database cannot be opened anymore, and the message "Database ... is being updated, or update was not completed" will be shown instead.
 +
 
 +
If the locking file is manually removed, you may be able to reopen the database, but you should be aware that database may have got corrupt due to the interrupted update process, and you should revert to the most recent database backup.
 +
 
 +
==Database Locks==
 +
 
 +
To avoid database corruptions that are caused by accidental write operations from different JVMs, a shared lock is requested on the database table file ({{Code|tbl.basex}}) whenever a database is opened. If an update operation is triggered, and if no exclusive lock can be acquired, it will be rejected with the message "Database ... is currently opened by another process.".
 +
 
 +
Please note that you cannot 100% rely on this mechanism, as it is not possible to synchronize operations across different JVMs. You will be safe when using the client/server or HTTP architecture.
 +
 
 +
=Changelog=
 +
 
 +
;Version 10.0
 +
* Updated: Lock detection was improved by splitting compilation into multiple steps.
 +
 
 +
;Version 9.4
 +
* Updated: Single lock option for reads and writes.
 +
 
 +
;Version 9.1
 +
* Updated: Query lock options were moved from {{Code|query}} to {{Code|basex}} namespace.
 +
 
 +
;Version 8.6
 +
* Updated: New {{Option|FAIRLOCK}} option, improved detection of lock patterns.
 +
 
 +
;Version 7.8
 +
* Added: Locks can also be acquired on [[Java Bindings#Locking|Java functions]].
 +
 
 +
;Version 7.6
 +
* Added: database locking introduced, replacing process locking.
 +
 
 +
;Version 7.2.1
 +
* Updated: pin files replaced with shared/exclusive filesystem locking.
 +
 
 +
;Version 7.2
 +
* Added: pin files to mark open databases.
 +
 
 +
;Version 7.1
 +
* Added: update lock files.

Latest revision as of 17:36, 1 December 2023

This article is part of the Advanced User's Guide. The BaseX client-server architecture offers ACID-safe transactions, with multiple readers and writers. Here is some more information about the transaction management.

Introduction[edit]

In a nutshell, a transaction is equal to a command or query. So each command or query sent to the server becomes a transaction.

Incoming requests are parsed and checked for errors on the server. If the command or query is not correct, the request will not be executed, and the user will receive an error message. Otherwise the request becomes a transaction and gets into the transaction monitor.

Please note that:

  • Locks cannot be synchronized across BaseX instances that run in different JVMs. If concurrent write operations are to be performed, we generally recommend working with the client/server or the HTTP architecture .
  • An unexpected abort of the server during a transaction, caused by a hardware failure or power cut, may lead to an inconsistent database state if a transaction was active at shutdown time. It is advisable to use the CREATE BACKUP command to regularly back up your database. If the worst case occurs, you can try the INSPECT command to check if your database has obvious inconsistencies, and use RESTORE to restore the last backed up version of the database.

XQuery Update[edit]

Many update operations are triggered by XQuery Update expressions. When executing an updating query, all update operations of the query are stored in a pending update list. They will be executed all at once, so the database is updated atomically. If any of the update sub-operations is erroneous, the overall transaction will be aborted.

Concurrency Control[edit]

BaseX provides support for multiple read and single write operations (using preclaiming and starvation-free two phase locking). This means that:

  • Read transactions are executed in parallel.
  • If an updating transaction comes in, it will be queued and executed after all previous read transaction have been executed.
  • Subsequent operations (read or write) will be queued until the updating transaction has completed.
  • Jobs without database access will never be locked. Globally locking jobs can now be executed in parallel with non-locking jobs.
  • Each database has its own queue: An update on database A will not block operations on database B. This is under the premise that it can be statically determined, i.e., before the transaction is evaluated, which databases will be accessed by a transaction (see below).
  • The number of maximum parallel transactions can be adjusted with the PARALLEL option.
  • By default, read transactions are favored, and transactions that access no databases can be evaluated even if the transactions limit has been reached. This behavior can be changed via the FAIRLOCK option.

Limitations[edit]

Commands[edit]

All commands come with a detector for local locks. Global locking is applied if the glob syntax is used:

  • DROP DB new*: Drop all databases starting with the prefix string new.

XQuery[edit]

Since BaseX 10, the lock detection has been fundamentally improved, by splitting compilation into multiple steps.

Local locks can be applied if it is possible after compile time to associate all database operations with static databases names:

Query Description
//item Read lock of the currently opened database
doc('factbook') Read lock of the factbook database
collection('documents/path/to/docs') Read lock of the documents database
delete nodes db:get('test')//*[@type = 'misc'] Write lock of the test database
declare variables $db external;
db:get($db)
Read lock of the database externally bound to $db.
for $db in ('db1', 'db2')
return db:get($db)
Read lock of db1 and db2, as the query is unrolled at compile time.
let $db := 'test'
return insert nodes <test/> into db:get($db)
Read lock of test, as the variable is inlined at compile time.
sum(1 to 100) No lock required
declare variable $SIMULATE := true();
if($SIMULATE) then <doc/> else db:get('doc')
No lock required, as the query is simplified to <doc/> at compile time.

A global lock will be assigned if the static detection fails:

Query Description
db:get(doc('test')/reference/text()) The name of the database to be opened will only be known at evaluation time.
(1 to 100) ! db:get(concat('db', .)) The UNROLLLIMIT can be increased to generate 100 db:get function calls and corresponding locks.

The functions fn:doc and fn:collection can be used for both accessing databases resources and fetching resources at the specified URI (see Access Resources for more details). There are two ways to reduce the number of locks:

  1. Turn off the WITHDB option to prevent the functions from accessing databases; or
  2. use fetch:doc for fetching resources from URIs, and use db:get for accessing databases.

You can consult the query info output (via the Info View of the GUI, via -V on Command-Line or via turning on the QUERYINFO option) to find out which databases are locked by a query, and if local locks or a global lock is applied.

XQuery Locks[edit]

By default, access to external resources (files on hard disk, HTTP requests, …) is not controlled by the transaction monitor of BaseX. Custom locks can be assigned via annotations, pragmas or options:

  • A lock string may consist of a single key or multiple keys separated with commas.
  • Internal locks and XQuery locks can co-exist. No conflicts arise, even if a lock string equals the name of a database that is locked by the transaction manager.
  • The lock is transformed into a write lock by making the corresponding expression updating.

Annotations[edit]

In the following module, lock annotations are used to prevent concurrent write operations on the same file:

module namespace config = 'config';

declare %basex:lock('CONFIG') function config:read() as xs:string {
  file:read-text('config.txt')
};

declare %updating %basex:lock('CONFIG') function config:write($data as xs:string) {
  file:write-text('config.txt', $data)
};

Some explanations:

  • If a query calls a reading basex:lock function, a read lock will be acquired for the user-defined CONFIG lock string before query evaluation.
  • If an updating basex:lock is called by a query, a write lock will be applied.
  • If another query calls a basex:lock function, it will be queued until the first query is evaluated.

Pragmas[edit]

Locks can also be declared via pragmas:

update:output((# basex:lock CONFIG #) {
  file:write('config.xml', <config/>)
})

The write locks is enforced via the Update.

Options[edit]

Locks for the functions of a module can also be assigned via option declarations:

declare option basex:lock 'CONFIG';

update:output(file:write('config.xml', <config/>))

Once again, a write lock is enforced.

Java Modules[edit]

Locks can also be acquired on Java functions which are imported and invoked from an XQuery expression. It is advisable to explicitly lock Java code whenever it performs sensitive read and write operations.

File-System Locks[edit]

Update Operations[edit]

During a database update, a locking file upd.basex will reside in that database directory. If the update fails for some unexpected reason, or if the process is killed ungracefully, this file will not be deleted. In this case, the database cannot be opened anymore, and the message "Database ... is being updated, or update was not completed" will be shown instead.

If the locking file is manually removed, you may be able to reopen the database, but you should be aware that database may have got corrupt due to the interrupted update process, and you should revert to the most recent database backup.

Database Locks[edit]

To avoid database corruptions that are caused by accidental write operations from different JVMs, a shared lock is requested on the database table file (tbl.basex) whenever a database is opened. If an update operation is triggered, and if no exclusive lock can be acquired, it will be rejected with the message "Database ... is currently opened by another process.".

Please note that you cannot 100% rely on this mechanism, as it is not possible to synchronize operations across different JVMs. You will be safe when using the client/server or HTTP architecture.

Changelog[edit]

Version 10.0
  • Updated: Lock detection was improved by splitting compilation into multiple steps.
Version 9.4
  • Updated: Single lock option for reads and writes.
Version 9.1
  • Updated: Query lock options were moved from query to basex namespace.
Version 8.6
  • Updated: New FAIRLOCK option, improved detection of lock patterns.
Version 7.8
Version 7.6
  • Added: database locking introduced, replacing process locking.
Version 7.2.1
  • Updated: pin files replaced with shared/exclusive filesystem locking.
Version 7.2
  • Added: pin files to mark open databases.
Version 7.1
  • Added: update lock files.