Difference between revisions of "XQuery Extensions"

From BaseX Documentation
Jump to navigation Jump to search
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This article is part of the [[XQuery|XQuery Portal]]. It lists extensions and optimizations that are specific to the BaseX XQuery processor.
 
This article is part of the [[XQuery|XQuery Portal]]. It lists extensions and optimizations that are specific to the BaseX XQuery processor.
  
=Suffixes=
+
=Option Declarations=
  
In BaseX, files with the suffixes {{Code|.xq}}, {{Code|.xqm}}, {{Code|.xqy}}, {{Code|.xql}}, {{Code|.xqu}} and {{Code|.xquery}} are treated as XQuery files. In XQuery, there are main and library modules:
+
==Database Options==
  
* Main modules have an expression as query body. Here is a minimum example:
+
[[Options|Local database options]] can be set in the prolog of an XQuery main module. In the option declaration, options need to be bound to the [[Database Module]] namespace. All values will be reset after the evaluation of a query:
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
'Hello World!'
+
declare option db:chop 'false';
 +
doc('doc.xml')
 
</pre>
 
</pre>
  
* Library modules start with a module namespace declaration and have no query body:
+
==XQuery Locks==
 +
 
 +
If [[Transactions#XQuery_Locks|XQuery Locks]] are defined in the query prolog of a module, access to functions of this module locks will be controlled by the central transaction management.
 +
 
 +
If the following XQuery code is called by two clients in parallel, the queries will be evaluated one after another:
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
module namespace hello = 'http://basex.org/examples/hello';
+
declare option basex:write-lock 'CONFIGLOCK';
 +
file:write('config.xml', <config/>)
 +
</pre>
 +
 
 +
=Pragmas=
 +
 
 +
==BaseX Pragmas==
 +
 
 +
Many optimizations in BaseX will only be performed if an expression is ''deterministic'' (i. e., if it always yields the same output and does not have side effects). By flagging an expression as non-deterministic, optimizations and query rewritings can be suppressed:
  
declare function hello:world() {
+
<pre class="brush:xquery">
   'Hello World!'
+
sum( (# basex:non-deterministic #) {
};
+
   1 to 100000000
 +
})
 
</pre>
 
</pre>
  
We recommend {{Code|.xq}} as suffix for for main modules, and {{Code|.xqm}} for library modules. However, the actual module type will dynamically be detected when a file is opened and parsed.
+
This pragma can be helpful when debugging your code.
  
=Option Declarations=
+
{{Mark|Introduced with Version 9.1:}}
  
[[Options|Local database options]] can be set in the prolog of an XQuery main module. In the option declaration, options need to be bound to the [[Database Module]] namespace. All values will be reset after the evaluation of a query:
+
In analogy with option declarations and function annotations, [[Transactions#XQuery_Locks|XQuery Locks]] can also set via pragmas:
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
declare option db:chop 'false';
+
(# basex:write-lock CONFIGLOCK #) {
doc('doc.xml')
+
  file:write('config.xml', <config/>)
 +
}
 
</pre>
 
</pre>
  
=Pragmas=
+
==Database Pragmas==
 +
 
 +
All [[Options|local options]] can be assigned via pragmas. Some examples:
  
[[Options|Local database options]] can be assigned locally via pragmas:
+
* Enforce query to be rewritten for index access. This can e. g. be helpful if the name of a database is not static (see [[Indexes#Enforce Rewritings|Enforce Rewritings]] for more examples):
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
(# db:chop false #) { doc('doc.xml') }
+
(# db:enforceindex #) {
 +
  for $db in ('persons1', 'persons2', 'persons3')
 +
  return db:open($db)//name[text() = 'John']
 +
}
 
</pre>
 
</pre>
  
Various optimizations can be disabled by marking an expression as non-deterministic:
+
* Temporarily disable node copying in node constructors (see {{Option|COPYNODE}} for more details). The following query will be evaluated faster, and take much less memory, than without pragma, because the database nodes will not be fully copied, but only attached to the new {{Code|xml}} parent element:
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
count( (# basex:non-deterministic #) { 1 to 10 })
+
file:write(
 +
  'wrapped-db-nodes.xml',
 +
  (# db:copynode false #) {
 +
    <xml>{ db:open('huge') }</xml>
 +
  }
 +
)
 
</pre>
 
</pre>
  
 
=Annotations=
 
=Annotations=
  
The following implementation-defined annotations are available:
+
==Function Inlining==
 +
 
 +
{{Code|%basex:inline([limit])}} controls if functions will be inlined.
 +
 
 +
If XQuery functions are ''inlined'', the function call will be replaced by a FLWOR expression, in which the function variables are bound to let clauses, and in which the function body is returned. This optimization triggers further query rewritings that will speed up your query. An example:
 +
 
 +
'''Query:'''
 +
 
 +
<pre class="brush:xquery">
 +
declare function local:square($a) { $a * $a };
 +
for $i in 1 to 3
 +
return local:square($i)
 +
</pre>
 +
 
 +
'''Query after function inlining:'''
 +
 
 +
<pre class="brush:xquery">
 +
for $i in 1 to 3
 +
return
 +
  let $a := $i
 +
  return $a * $a
 +
</pre>
 +
 
 +
'''Query after further optimizations:'''
 +
 
 +
<pre class="brush:xquery">
 +
for $i in 1 to 3
 +
return $i * $i
 +
</pre>
 +
 
 +
By default, XQuery functions will be ''inlined'' if the query body is not too large and does not exceed a fixed number of expressions, which can be adjusted via the {{Option|INLINELIMIT}} option.
  
* {{Code|%basex:inline([limit])}} enforces the inlining of a function. The inlining limit can be specified as argument. Inlining can be disabled by specifying {{Code|0}}.
+
The annotation can be used to overwrite this global limit: Function inlining can be enforced if no argument is specified. Inlining will be disabled if {{Code|0}} is specified.
  
 
'''Example:'''
 
'''Example:'''
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
 +
(: disable function inlining; the full stack trace will be shown... :)
 
declare %basex:inline(0) function local:e() { error() };
 
declare %basex:inline(0) function local:e() { error() };
 
local:e()
 
local:e()
Line 69: Line 125:
 
</pre>
 
</pre>
  
In the next example, function inlining was disabled globally by assigning {{Code|0}} to the [[Options#INLINELIMIT|INLINELIMIT]] option. However, annotation is enforced to a single function:
+
==Lazy Evaluation==
 +
 
 +
{{Code|%basex:lazy}} enforces lazy evaluation of a global variable. An example:
 +
 
 +
'''Example:'''
 +
<pre class="brush:xquery">
 +
declare %basex:lazy variable $january := doc('does-not-exist');
 +
if(month-from-date(current-date()) = 1) then $january else ()
 +
</pre>
 +
 
 +
The annotation ensures that an error will only be raised if the condition yields true. Without the annotation, the error will always be raised, because the referenced document is not found.
 +
 
 +
==XQuery Locks==
 +
 
 +
{{Mark|Introduced with Version 9.1:}}
 +
 
 +
In analogy with option declarations and pragmas, [[Transactions#XQuery_Locks|XQuery Locks]] can also set via annotations:
  
'''Example:'''
 
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
declare option db:inlinelimit '0';
+
declare %basex:write-lock('CONFIGLOCK') function local:write() {
declare %basex:inline function local:id($x) { $x };
+
  file:write('config.xml', <config/>)
local:id(123)
+
};
 
</pre>
 
</pre>
  
The query will be optimized to {{Code|123}}.
+
=Functions=
  
* {{Code|%basex:lazy}} enforces the lazy evaluation of a global variable. Example:
+
==Regular Expressions==
 +
 
 +
{{Mark|Introduced with Version 9.1:}}
 +
 
 +
In analogy with Saxon, you can specify the flag {{Code|j}} to revert to Java’s default regex parser. For example, this allows you to use the word boundary option {{Code|\b}}, which has not been included in the XQuery grammar for regular expressions:
  
 
'''Example:'''  
 
'''Example:'''  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
declare %basex:lazy variable $january := doc('does-not-exist');
+
(: yields "!Hi! !there!" :)
if(month-from-date(current-date()) == 1) then $january else ()
+
replace('Hi there', '\b', '!', 'j')
 
</pre>
 
</pre>
 
The annotation ensures that an error will only be thrown if the condition yields true. Without the annotation, the error will always be thrown, because the referenced document is not found.
 
  
 
=Serialization=
 
=Serialization=
Line 97: Line 170:
 
For more information and some additional BaseX-specific parameters, see the article on [[Serialization]].
 
For more information and some additional BaseX-specific parameters, see the article on [[Serialization]].
  
=Non-determinism=
+
=Non-Determinism=
  
 
In [http://www.w3.org/TR/xpath-functions-31/#dt-deterministic XQuery], ''deterministic'' functions are “guaranteed to produce ·identical· results from repeated calls within a single ·execution scope· if the explicit and implicit arguments are identical”. In BaseX, many extension functions are non-deterministic or side-effecting. If an expression is internally flagged as non-deterministic, various optimizations that might change their execution order will not be applied.
 
In [http://www.w3.org/TR/xpath-functions-31/#dt-deterministic XQuery], ''deterministic'' functions are “guaranteed to produce ·identical· results from repeated calls within a single ·execution scope· if the explicit and implicit arguments are identical”. In BaseX, many extension functions are non-deterministic or side-effecting. If an expression is internally flagged as non-deterministic, various optimizations that might change their execution order will not be applied.
Line 126: Line 199:
  
 
Two non-deterministic functions will be bound to <code>$read</code>, and the result of the function call will be bound to <code>$ignored</code>. As the variable is not referenced in the subsequent code, the let clause would usually be discarded by the compiler. In the given query, however, execution will be enforced because of the BaseX-specific {{Code|non-deterministic}} keyword.
 
Two non-deterministic functions will be bound to <code>$read</code>, and the result of the function call will be bound to <code>$ignored</code>. As the variable is not referenced in the subsequent code, the let clause would usually be discarded by the compiler. In the given query, however, execution will be enforced because of the BaseX-specific {{Code|non-deterministic}} keyword.
 +
 +
=Suffixes=
 +
 +
In BaseX, files with the suffixes {{Code|.xq}}, {{Code|.xqm}}, {{Code|.xqy}}, {{Code|.xql}}, {{Code|.xqu}} and {{Code|.xquery}} are treated as XQuery files. In XQuery, there are main and library modules:
 +
 +
* Main modules have an expression as query body. Here is a minimum example:
 +
 +
<pre class="brush:xquery">
 +
'Hello World!'
 +
</pre>
 +
 +
* Library modules start with a module namespace declaration and have no query body:
 +
 +
<pre class="brush:xquery">
 +
module namespace hello = 'http://basex.org/examples/hello';
 +
 +
declare function hello:world() {
 +
  'Hello World!'
 +
};
 +
</pre>
 +
 +
We recommend {{Code|.xq}} as suffix for for main modules, and {{Code|.xqm}} for library modules. However, the actual module type will dynamically be detected when a file is opened and parsed.
  
 
=Miscellaneous=
 
=Miscellaneous=
  
 
Various other extensions are described in the articles on [[Full-Text#BaseX Features|XQuery Full Text]] and [[Updates|XQuery Update]].
 
Various other extensions are described in the articles on [[Full-Text#BaseX Features|XQuery Full Text]] and [[Updates|XQuery Update]].
 +
 +
=Changelog=
 +
 +
;Version 9.1:
 +
 +
* Added: XQuery Locks via pragmas and function annotations.
 +
* Added: [[#Regular expressions|Regular Expressions]], {{Code|j}} flag for using Java’s default regex parser.

Revision as of 18:20, 20 August 2018

This article is part of the XQuery Portal. It lists extensions and optimizations that are specific to the BaseX XQuery processor.

Option Declarations

Database Options

Local database options can be set in the prolog of an XQuery main module. In the option declaration, options need to be bound to the Database Module namespace. All values will be reset after the evaluation of a query:

declare option db:chop 'false';
doc('doc.xml')

XQuery Locks

If XQuery Locks are defined in the query prolog of a module, access to functions of this module locks will be controlled by the central transaction management.

If the following XQuery code is called by two clients in parallel, the queries will be evaluated one after another:

declare option basex:write-lock 'CONFIGLOCK';
file:write('config.xml', <config/>)

Pragmas

BaseX Pragmas

Many optimizations in BaseX will only be performed if an expression is deterministic (i. e., if it always yields the same output and does not have side effects). By flagging an expression as non-deterministic, optimizations and query rewritings can be suppressed:

sum( (# basex:non-deterministic #) {
  1 to 100000000
})

This pragma can be helpful when debugging your code.

Template:Mark

In analogy with option declarations and function annotations, XQuery Locks can also set via pragmas:

(# basex:write-lock CONFIGLOCK #) {
  file:write('config.xml', <config/>)
}

Database Pragmas

All local options can be assigned via pragmas. Some examples:

  • Enforce query to be rewritten for index access. This can e. g. be helpful if the name of a database is not static (see Enforce Rewritings for more examples):
(# db:enforceindex #) {
  for $db in ('persons1', 'persons2', 'persons3')
  return db:open($db)//name[text() = 'John']
}
  • Temporarily disable node copying in node constructors (see COPYNODE for more details). The following query will be evaluated faster, and take much less memory, than without pragma, because the database nodes will not be fully copied, but only attached to the new xml parent element:
file:write(
  'wrapped-db-nodes.xml',
  (# db:copynode false #) {
    <xml>{ db:open('huge') }</xml>
  }
)

Annotations

Function Inlining

%basex:inline([limit]) controls if functions will be inlined.

If XQuery functions are inlined, the function call will be replaced by a FLWOR expression, in which the function variables are bound to let clauses, and in which the function body is returned. This optimization triggers further query rewritings that will speed up your query. An example:

Query:

declare function local:square($a) { $a * $a };
for $i in 1 to 3
return local:square($i)

Query after function inlining:

for $i in 1 to 3
return
  let $a := $i
  return $a * $a

Query after further optimizations:

for $i in 1 to 3
return $i * $i

By default, XQuery functions will be inlined if the query body is not too large and does not exceed a fixed number of expressions, which can be adjusted via the INLINELIMIT option.

The annotation can be used to overwrite this global limit: Function inlining can be enforced if no argument is specified. Inlining will be disabled if 0 is specified.

Example:

(: disable function inlining; the full stack trace will be shown... :)
declare %basex:inline(0) function local:e() { error() };
local:e()

Result:

Stopped at query.xq, 1/53:
[FOER0000] Halted on error().

Stack Trace:
- query.xq, 2/9

Lazy Evaluation

%basex:lazy enforces lazy evaluation of a global variable. An example:

Example:

declare %basex:lazy variable $january := doc('does-not-exist');
if(month-from-date(current-date()) = 1) then $january else ()

The annotation ensures that an error will only be raised if the condition yields true. Without the annotation, the error will always be raised, because the referenced document is not found.

XQuery Locks

Template:Mark

In analogy with option declarations and pragmas, XQuery Locks can also set via annotations:

declare %basex:write-lock('CONFIGLOCK') function local:write() {
  file:write('config.xml', <config/>)
};

Functions

Regular Expressions

Template:Mark

In analogy with Saxon, you can specify the flag j to revert to Java’s default regex parser. For example, this allows you to use the word boundary option \b, which has not been included in the XQuery grammar for regular expressions:

Example:

(: yields "!Hi! !there!" :)
replace('Hi there', '\b', '!', 'j')

Serialization

  • basex is used as the default serialization method: nodes are serialized as XML, atomic values are serialized as string, and items of binary type are output in their native byte representation. Function items (including maps and arrays) are output just like with the adaptive method.
  • csv allows you to output XML nodes as CSV data (see the CSV Module for more details).

For more information and some additional BaseX-specific parameters, see the article on Serialization.

Non-Determinism

In XQuery, deterministic functions are “guaranteed to produce ·identical· results from repeated calls within a single ·execution scope· if the explicit and implicit arguments are identical”. In BaseX, many extension functions are non-deterministic or side-effecting. If an expression is internally flagged as non-deterministic, various optimizations that might change their execution order will not be applied.

(: QUERY A... :)
let $n := 456
for $i in 1 to 2
return $n

(: ...will be optimized to :)
for $i in 1 to 2
return 456

(: QUERY B will not be rewritten :)
let $n := random:integer()
for $i in 1 to 2
return $n

In some cases, functions may contain non-deterministic code, but the query compiler may not be able to detect this statically. See the following example:

for $read in (file:read-text#1, file:read-binary#1)
let $ignored := non-deterministic $read('input.file')
return ()

Two non-deterministic functions will be bound to $read, and the result of the function call will be bound to $ignored. As the variable is not referenced in the subsequent code, the let clause would usually be discarded by the compiler. In the given query, however, execution will be enforced because of the BaseX-specific non-deterministic keyword.

Suffixes

In BaseX, files with the suffixes .xq, .xqm, .xqy, .xql, .xqu and .xquery are treated as XQuery files. In XQuery, there are main and library modules:

  • Main modules have an expression as query body. Here is a minimum example:
'Hello World!'
  • Library modules start with a module namespace declaration and have no query body:
module namespace hello = 'http://basex.org/examples/hello';

declare function hello:world() {
  'Hello World!'
};

We recommend .xq as suffix for for main modules, and .xqm for library modules. However, the actual module type will dynamically be detected when a file is opened and parsed.

Miscellaneous

Various other extensions are described in the articles on XQuery Full Text and XQuery Update.

Changelog

Version 9.1
  • Added: XQuery Locks via pragmas and function annotations.
  • Added: Regular Expressions, j flag for using Java’s default regex parser.