Changes

XQuery Optimizations (edit)

Revision as of 09:20, 8 April 2021

1,794 bytes added , 09:20, 8 April 2021

This article is part of the [[XQuery|XQuery Portal]]. Optimizations are presented that speed up the execution time and reduce memory consumption.

The ~~article~~ text will be ~~further~~ regularly extended with ~~each version of BaseX~~further examples.

=Introduction=

Subsequent rewritings might result in query plans that differ a lot from the original query. As this might complicate debugging, you can disable function inling during development by setting {{Option|INLINELIMIT}} to {{Code|0}}.

==~~Static Typing~~Loop Unrolling== {{Mark|Introduced with Version 9.6:}} Loops with few iterations are ''unrolled'' by the XQuery compiler to enable further optimizations: <syntaxhighlight lang="xquery">(1 to 2) ! (. * 2) (: rewritten to :)1 ! (. * 2), 2 ! (. * 2) (: further rewritten to :)1 * 2, 2 * 2 (: further rewritten to :)2, 4</syntaxhighlight> Folds are unrolled, too: <syntaxhighlight lang="xquery">let $f := function($a, $b) { $a * $b }return fold-left(2 to 5, 1, $f) (: rewritten to :)let $f := function($a, $b) { $a * $b }return $f($f($f($f(1, 2), 3), 4), 5)</syntaxhighlight> The standard unroll limit is <code>5</code>. It can be adjusted with the {{Option|UNROLLLIMIT}} option, e.g. via a pragma: <syntaxhighlight lang="xquery">(# db:unrolllimit 10 #) { for $i in 1 to 10 return db:open('db' || $i)//*[text() = 'abc']} (: rewritten to :)db:open('db1')//*[text() = 'abc'],db:open('db2')//*[text() = 'abc'],...db:open('db10')//*[text() = 'abc'],</syntaxhighlight> The last example indicates that index rewritings might be triggered by unrolling loops with paths on database nodes. The following expressions can be unrolled: * Simple map expressions* Simple FLWOR expressions* Filter expressions* [[Higher-Order Functions#fn:fold-left|fn:fold-left]], [[Higher-Order Functions#fn:fold-right|fn:fold-right]], {{Function|Higher-Order Functions|fn:fold-left1}} Care should be taken if a higher value is selected, as memory consumption and compile time will increase. ==Paths== Due to the compact syntax of XPath, it can make a big difference if a slash is added or omitted in a path expression. A classical example is the double slash {{Code|//}}, which is a shortcut for {{Code|descendant-or-node()/}}. If the query is evaluated without optimizations, all nodes of a document are gathered, and for each of them, the next step is evaluated. This leads to a potentially huge number of duplicate node tree traversals, most of which are redundant, as all duplicate nodes will be removed at the end anyway. In most cases, paths with a double slash can be rewritten to descendant steps… <syntaxhighlight lang="xquery">(: equivalent queries, with identical syntax trees :)doc('addressbook.xml')//city,doc('addressbook.xml')/descendant-or-self::node()/child::city (: rewritten to :)doc('addressbook.xml')/descendant::city</syntaxhighlight> …unless the last step does not contain a positional predicate: <syntaxhighlight lang="xquery">doc('addressbook.xml')//city[1]</syntaxhighlight> As the positional test refers to the city child step, a rewritten query would yield different steps. Paths may contain predicates that will be evaluated again by a later axis step. Such predicates are either shifted down or discarded: <syntaxhighlight lang="xquery">(: equivalent query :)a[b]/b[c/d]/c (: rewritten to :)a/b/c[d]</syntaxhighlight>

~~If the type~~ Names of ~~a value is known at compile time, type checks~~ nodes can be ~~removed~~specified via name tests or predicates. ~~In the example below~~If names are e.g. supplied via external variables, the ~~static information that {{Code|$i}} will always reference items of type {{Code|xs:integer}}~~ predicates can often be ~~utilized to simplify the expression~~dissolved:

~~for~~ declare variable $~~i in 1 to 5return typeswitch($i)~~ ~~case xs~~name external :~~numeric return~~ = '~~number~~city'; ~~default return~~ db:open('~~string~~addressbook')/descendant::*[name() = $name]

(: rewritten to :)

~~for $i in 1 to 5return~~ db:open('~~number~~addressbook')/descendant::city

</syntaxhighlight>

* {{Code|where}} clauses are rewritten to predicates.

* {{Code|if}} expressions in the return clause are rewritten to {{Code|where}} clauses.

~~Since {{Version|9.4}}, the~~ * The last {{Code|for}} clause is merged into the {{Code|return}} clause and rewritten to a [[XQuery_3.0|Simple_Map_Operator|simple map]] expression.

Various of these rewriting are demonstrated in the following example:

</syntaxhighlight>

==Static Typing== If the type of a value is known at compile time, type checks can be removed. In the example below, the static information that {{Code|$i}} will always reference items of type {{Code|xs:integer}} can be utilized to simplify the expression: <syntaxhighlight lang="xquery">for $i in 1 to 5return typeswitch($i) case xs:numeric return 'number' default return 'string' (: rewritten to :)for $i in 1 to 5return 'number'</syntaxhighlight> ==Pure Logic==

If expressions can often be simplified:

| Distributivity

|- valign="top"

| <code>$a ~~and~~ or not($a)</code>

| <code>true()</code>

| Tertium non datur

* <code>xs:double('NaN') * 0</code> yields <code>NaN</code> instead of <code>0</code>

* <code>true#0 and true#0</code> must raise an error; it cannot be simplified to <code>true#0</code>

~~==Paths==~~

Due to the compact syntax of XPath, it can make a big difference if a slash is added or omitted in a path expression. A classical example is the double slash {{Code|//}}, which is a shortcut for {{Code|descendant-or-node()/}}. If the query is evaluated without optimizations, all nodes of a document are gathered, and for each of them, the next step is evaluated. This leads to a potentially huge number of duplicate node tree traversals, most of which are redundant, as all duplicate nodes will be removed at the end anyway.

~~In most cases, paths with a double slash can be rewritten to descendant steps…~~

~~<syntaxhighlight lang="xquery">~~

~~(: equivalent queries, with identical syntax trees :)~~

~~doc('addressbook.xml')//city,~~

~~doc('addressbook.xml')/descendant-or-node()/child::city~~

~~(: rewritten to :)~~

~~doc('addressbook.xml')/descendant::city~~

~~</syntaxhighlight>~~

~~…unless the last step does not contain a positional predicate:~~

~~<syntaxhighlight lang="xquery">~~

~~doc('addressbook.xml')//city[1]~~

~~</syntaxhighlight>~~

~~As the positional test refers to the city child step, a rewritten query would yield different steps.~~

~~Paths may contain predicates that will be evaluated again by a later axis step. Such predicates are either shifted down or discarded:~~

~~<syntaxhighlight lang="xquery">~~

~~(: equivalent query :)~~

~~a[b]/b[c/d]/c~~

~~(: rewritten to :)~~

~~a/b/c[d]~~

~~</syntaxhighlight>~~

~~Names of nodes can be specified via name tests or predicates. If names are e.g. supplied via external variables, the predicates can often be dissolved:~~

~~<syntaxhighlight lang="xquery">~~

~~declare variable $name external := 'city';~~

~~db:open('addressbook')//*[name() = $name]~~

~~(: rewritten to :)~~

~~db:open('addressbook')//city~~

~~</syntaxhighlight>~~

=Physical Optimizations=

==Index Rewritings==

A major feature of BaseX is the ability to rewrite all kinds of query patterns for ~~[[Indexes|~~index access]].

The following queries are all equivalent. They will be rewritten to exactly the same query that will eventually access the text index of a <code>factbook.xml</code> database instance (the file included in our full distributions):

for $p in $auction/site/people/person

let $a :=

for $t in $auction/site/~~closedauctions~~closed_auctions/~~closedauction~~closed_auction

where $t/buyer/@person = $p/@id

return $t

return <item person='"{ $p/name/text() }'">{ count($a) }</item>,

(: rewritten to :)

}</item>

</syntaxhighlight>

If the accessed database is not known at compile time, or if you want to give a predicate preference to another one, you can [[Indexes#Enforce Rewritings|enforce index rewritings]].

=Evaluation-Time Optimizations=

==Comparisons==

In many cases, the amount of data to be processed is only known after the query has been compiled. Moreover, the data that is looped through expressions may change. In those cases, the best optimizations needs to be chosen at runtime.

If sequences of items are compared against each other, a dynamic hash index will be generated, and the total number of comparisons can be significantly reduced. In the following example, <code>count($input1) * count($input2)</code> comparisons would need to be made without the intermediate index structure:

=Changelog=

;Version 9.6

* Added: {{Option|UNROLLLIMIT}}

Introduced with Version 9.4.

CG

Bureaucrats, editor, reviewer, Administrators

13,550

edits

Changes

XQuery Optimizations (edit)

Revision as of 09:20, 8 April 2021

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools