Difference between revisions of "XQuery Update"

From BaseX Documentation
Jump to navigation Jump to search
(Avoid using "node" as element name in examples because "node" independently appears as a keyword)
(30 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
It summarizes the update features of BaseX.
 
It summarizes the update features of BaseX.
  
BaseX offers a complete implementation of the
+
BaseX offers a complete implementation of the [http://www.w3.org/TR/xquery-update-10/ XQuery Update Facility (XQUF)]. This article aims to provide a very quick and basic introduction to the XQUF. First, some examples for update expressions are given. After that, the challenges are addressed that arise due to the functional semantics of the language. These are stated in the [[Update#Concepts|Concepts]] paragraph.
[http://www.w3.org/TR/xquery-update-10/ XQuery Update Facility (XQUF)].
 
This article aims to provide a very quick and basic introduction to the XQUF.
 
First, some examples for update expressions are given. After that,
 
a few problems are addressed that frequently arise due to the nature of the
 
language. These are stated in the [[Update#Concepts|Concepts]] paragraph.
 
  
 
=Features=
 
=Features=
  
 
==Updating Expressions==
 
==Updating Expressions==
 +
 
There are five new expressions to modify data. While {{Code|insert}}, {{Code|delete}}, {{Code|rename}} and {{Code|replace}} are basically self-explanatory, the {{Code|transform}} expression is different, as modified nodes are copied in advance and the original databases remain untouched.
 
There are five new expressions to modify data. While {{Code|insert}}, {{Code|delete}}, {{Code|rename}} and {{Code|replace}} are basically self-explanatory, the {{Code|transform}} expression is different, as modified nodes are copied in advance and the original databases remain untouched.
  
Line 59: Line 55:
 
==Non-Updating Expressions==
 
==Non-Updating Expressions==
  
===transform===
+
===copy/modify/return===
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
Line 122: Line 118:
  
 
===update===
 
===update===
 +
 +
The {{Code|update}} expression is a BaseX-specific convenience operator for the {{Code|copy/modify/return}}
 +
construct:
 +
 +
* Similar to the [[XQuery 3.0#Simple Map Operator|XQuery 3.0 map operator]], the value of the first
 +
expression is bound as context item, and the second expression performs updates on this item.
 +
The updated item is returned as result:
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
Line 128: Line 131:
 
</pre>
 
</pre>
  
The {{Code|update}} expression is a convenience operator for writing simple transform expressions.
+
* More than one node can be specified as source:
Similar to the [[XQuery 3.0#Simple Map Operator|XQuery 3.0 map operator]], the value of the first
+
 
expression is bound as context item, and the second expression performs updates on this item.
+
<pre class="brush:xquery">
The updated item is returned as result.
+
db:open('data')//item update delete node text()
 +
</pre>
 +
 
 +
* If wrapped with curly braces, update expressions can be chained:
 +
 
 +
<pre class="brush:xquery">
 +
<root/> update {
 +
  insert node <child/> into .
 +
} update {
 +
  insert node "text" into child
 +
}
 +
</pre>
 +
 
 +
===transform with===
 +
 
 +
The {{Code|transform with}} expression was added to the current [https://www.w3.org/TR/xquery-update-30/#id-transform-with XQuery Update 3.0] working draft. It is a simple version of the [[#update|update]] expression and also available in BaseX:
  
Please note that {{Code|update}} is not part of the official XQuery Update Facility yet.
+
<pre class="brush:xquery">
It is currently being discussed in the [https://www.w3.org/Bugs/Public/show_bug.cgi?id=23643 W3 Bug Tracker];
+
<xml>text</xml> transform with {
your feedback is welcome.
+
  replace value of node . with 'new-text'
 +
}
 +
</pre>
  
 
==Functions==
 
==Functions==
  
===fn:put===
+
===Built-in Functions===
 +
 
 +
{{Code|fn:put()}} is can be used to serialize XDM instances to secondary storage:
 +
 
 +
* The function will be executed after all other updates.
 +
* Serialized documents therefore reflect all changes made effective during a query.
 +
* No files will be created if the addressed nodes have been deleted.
 +
* Serialization parameters can be specified as third argument (more details are found in the [https://www.w3.org/TR/xquery-update-30/#id-func-put XQUF 3.0 Specification]).
 +
 
 +
Numerous additional [[Database Module#Updates|database functions]] exist for performing updates on document and database level.
 +
 
 +
===User-Defined Functions===
  
{{Code|fn:put()}} is also part of the XQUF and enables the user to serialize XDM instances to secondary storage. It is executed at the end of a snapshot. Serialized documents therefore reflect all changes made effective during a query.
+
If an updating function item is called, the function call must be prefixed with the keyword {{Code|updating}}. This ensures that the query compiler can statically detect if an invoked function item will perform updates or not:
  
===Database Functions===
+
<pre class="brush:xquery">
 +
let $node := <node>TO-BE-DELETED</node>
 +
let $delete-text := %updating function($node) {
 +
  delete node $node//text()
 +
}
 +
return $node update (
 +
  updating $delete-text(.)
 +
)
 +
</pre>
  
Some additional, updating [[Database Module#Updates|database functions]] exist in order to perform updates on document and database level.
+
As shown in the example, user-defined and anonymous functions can additionally be annotated as {{Code|%updating}}.
  
 
=Concepts=
 
=Concepts=
 +
 
There are a few specialties around XQuery Update that you should know about. In addition to the '''simple expression''', the XQUF adds the '''updating expression''' as a new type of expression. An updating expression returns only a Pending Update List (PUL) as a result which is subsequently applied to addressed databases and DOM nodes. A simple expression cannot perform any permanent changes and returns an empty or non-empty sequence.
 
There are a few specialties around XQuery Update that you should know about. In addition to the '''simple expression''', the XQUF adds the '''updating expression''' as a new type of expression. An updating expression returns only a Pending Update List (PUL) as a result which is subsequently applied to addressed databases and DOM nodes. A simple expression cannot perform any permanent changes and returns an empty or non-empty sequence.
  
 
==Pending Update List==
 
==Pending Update List==
  
The most important thing to keep in mind when using XQuery Update is the Pending Update List (PUL). Updating statements are not executed immediately, but are first collected as update primitives within a set-like structure. At the end of a query, after some consistency checks and optimizations, the update primitives will be applied in the following order:
+
The most important thing to keep in mind when using XQuery Update is the Pending Update List (PUL). Updating statements are not executed immediately, but are first collected as update primitives within a set-like structure. After the evaluation of the query, and after some consistency checks and optimizations, the update primitives will be applied in the following order:
  
 
* '''Backups (1)''': {{Code|db:create-backup()}}
 
* '''Backups (1)''': {{Code|db:create-backup()}}
Line 193: Line 233:
 
By default, it is not possible to mix different types of expressions in a query result. The outermost expression of a query must either be a collection of updating or non-updating expressions. But there are two ways out:
 
By default, it is not possible to mix different types of expressions in a query result. The outermost expression of a query must either be a collection of updating or non-updating expressions. But there are two ways out:
  
* The BaseX-specific <code>[[Database Module#db:output|db:output()]]</code> function bridges this gap: it caches the results of its arguments at runtime and returns them after all updates have been processed. The following example performs an update and returns a success message:
+
* The BaseX-specific <code>[[Update Module#update:output|update:output()]]</code> function bridges this gap: it caches the results of its arguments at runtime and returns them after all updates have been processed. The following example performs an update and returns a success message:
  
 
<pre class="brush:xquery">
 
<pre class="brush:xquery">
db:output("Update successful."), insert node <c/> into doc('factbook')/mondial
+
update:output("Update successful."), insert node <c/> into doc('factbook')/mondial
 
</pre>
 
</pre>
  
Line 202: Line 242:
  
 
If you want to modify nodes in main memory, you can use the [[Update#transform|transform expression]].
 
If you want to modify nodes in main memory, you can use the [[Update#transform|transform expression]].
 
==Function Declaration==
 
 
To use updating expressions within a function, the {{Code|%updating}} annotation has to be added to the function declaration. A correct declaration of a function that contains updating expressions (or one that calls updating functions) looks like this:
 
 
<pre class="brush:xquery">
 
declare %updating function { ... }
 
</pre>
 
  
 
==Effects==  
 
==Effects==  
Line 228: Line 260:
 
=Error Messages=
 
=Error Messages=
  
Along with the Update Facility, a number of new error codes and messages have been added
+
Along with the Update Facility, a number of new error codes and messages have been added to the specification and BaseX. All errors are listed in the [[XQuery Errors#Update Errors|XQuery Errors]] overview.
to the specification and BaseX. All errors are listed in the
+
 
[[XQuery Errors#Update Errors|XQuery Errors]] overview.
+
Please remember that the collected updates will be executed after the query evaluation. All logical errors will be raised before the updates are actually executed.
  
 
=Changelog=
 
=Changelog=
 +
 +
;Version 9.0
 +
* Updated: [[#Built-in Functions|Built-in Functions]]: serialization parameters
 +
 +
;Version 8.5
 +
* Added: [[#transform with|transform with]]
 +
* Updated: [[#update|update]] was extended.
  
 
;Version 8.0
 
;Version 8.0

Revision as of 14:04, 26 August 2018

This article is part of the XQuery Portal. It summarizes the update features of BaseX.

BaseX offers a complete implementation of the XQuery Update Facility (XQUF). This article aims to provide a very quick and basic introduction to the XQUF. First, some examples for update expressions are given. After that, the challenges are addressed that arise due to the functional semantics of the language. These are stated in the Concepts paragraph.

Features

Updating Expressions

There are five new expressions to modify data. While insert, delete, rename and replace are basically self-explanatory, the transform expression is different, as modified nodes are copied in advance and the original databases remain untouched.

An expression consists of a target node (the node we want to alter) and additional information like insertion nodes, a QName, etc. which depends on the type of expression. Optional modifiers are available for some of them. You can find a few examples and additional information below.

insert

insert node (attribute { 'a' } { 5 }, 'text', <e/>) into /n

Insert enables you to insert a sequence of nodes into a single target node. Several modifiers are available to specify the exact insert location: insert into as first/as last, insert before/after and insert into.

Note: in most cases, as last and after will be evaluated faster than as first and before!

delete

delete node //n

The example query deletes all <n> elements in your database. Note that, in contrast to other updating expressions, the delete expression allows multiple nodes as a target.

replace

replace node /n with <a/>

The target element is replaced by the DOM node <a/>. You can also replace the value of a node or its descendants by using the modifier value of.

replace value of node /n with 'newValue'

All descendants of /n are deleted and the given text is inserted as the only child. Note that the result of the insert sequence is either a single text node or an empty sequence. If the insert sequence is empty, all descendants of the target are deleted. Consequently, replacing the value of a node leaves the target with either a single text node or no descendants at all.

rename

for $n in //originalNode
return rename node $n as 'renamedNode' 

All originalNode elements are renamed. An iterative approach helps to modify multiple nodes within a single statement. Nodes on the descendant- or attribute-axis of the target are not affected. This has to be done explicitly as well.

Non-Updating Expressions

copy/modify/return

copy $c := doc('example.xml')//originalNode[@id = 1]
modify rename node $c as 'copyOfNode'
return $c

The originalNode element with @id=1 is copied and subsequently assigned a new QName using the rename expression. Note that the transform expression is the only expression which returns an actual XDM instance as a result. You can therefore use it to modify results and especially DOM nodes. This is an issue beginners are often confronted with. More on this topic can be found in the XQUF Concepts section.

The following example demonstrates a common use case:

Query:

copy $c :=
  <entry>
    <title>Transform expression example</title>
    <author>BaseX Team</author>
  </entry>
modify (
  replace value of node $c/author with 'BaseX',
  replace value of node $c/title with concat('Copy of: ', $c/title),
  insert node <author>Joey</author> into $c
)
return $c

Result:

<entry>
  <title>Copy of: Transform expression example</title>
  <author>BaseX</author>
  <author>Joey</author>
</entry>

The <entry> element (here it is passed to the expression as a DOM node) can also be replaced by a database node, e.g.:

copy $c := (db:open('example')//entry)[1]
...

In this case, the original database node remains untouched as well, as all updates are performed on the node copy.

Here is an example where we return an entire document, parts modified and all:

copy $c := doc("zaokeng.kml")
modify (
  for $d in $c//*:Point
  return insert node (
    <extrude>1</extrude>,
    <altitudeMode>relativeToGround</altitudeMode>
  )  before $d/*:coordinates
)
return $c

update

The update expression is a BaseX-specific convenience operator for the copy/modify/return construct:

expression is bound as context item, and the second expression performs updates on this item. The updated item is returned as result:

for $item in db:open('data')//item
return $item update delete node text()
  • More than one node can be specified as source:
db:open('data')//item update delete node text()
  • If wrapped with curly braces, update expressions can be chained:
<root/> update {
  insert node <child/> into .
} update {
  insert node "text" into child
}

transform with

The transform with expression was added to the current XQuery Update 3.0 working draft. It is a simple version of the update expression and also available in BaseX:

<xml>text</xml> transform with {
  replace value of node . with 'new-text'
}

Functions

Built-in Functions

fn:put() is can be used to serialize XDM instances to secondary storage:

  • The function will be executed after all other updates.
  • Serialized documents therefore reflect all changes made effective during a query.
  • No files will be created if the addressed nodes have been deleted.
  • Serialization parameters can be specified as third argument (more details are found in the XQUF 3.0 Specification).

Numerous additional database functions exist for performing updates on document and database level.

User-Defined Functions

If an updating function item is called, the function call must be prefixed with the keyword updating. This ensures that the query compiler can statically detect if an invoked function item will perform updates or not:

let $node := <node>TO-BE-DELETED</node>
let $delete-text := %updating function($node) {
  delete node $node//text()
}
return $node update (
  updating $delete-text(.)
)

As shown in the example, user-defined and anonymous functions can additionally be annotated as %updating.

Concepts

There are a few specialties around XQuery Update that you should know about. In addition to the simple expression, the XQUF adds the updating expression as a new type of expression. An updating expression returns only a Pending Update List (PUL) as a result which is subsequently applied to addressed databases and DOM nodes. A simple expression cannot perform any permanent changes and returns an empty or non-empty sequence.

Pending Update List

The most important thing to keep in mind when using XQuery Update is the Pending Update List (PUL). Updating statements are not executed immediately, but are first collected as update primitives within a set-like structure. After the evaluation of the query, and after some consistency checks and optimizations, the update primitives will be applied in the following order:

  • Backups (1): db:create-backup()
  • XQuery Update: insert before, delete, replace, rename, replace value, insert attribute, insert into first, insert into, insert into last, insert, insert after, put
  • Documents: db:add(), db:store(), db:replace(), db:rename(), db:delete(), db:optimize(), db:flush(),
  • Users: user:grant(), user:password(), user:drop(), user:alter(), user:create()
  • Databases: db:copy(), db:drop(), db:alter(), db:create()
  • Backups (2): db:restore(), db:drop-backup()

If an inconsistency is found, an error message is returned and all accessed databases remain untouched (atomicity). For the user, this means that updates are only visible after the end of a snapshot.

It may be surprising to see db:create in the lower part of this list. This means that newly created database cannot be accessed by the same query, which can be explained by the semantics of updating queries: all expressions can only be evaluated on databases that already exist while the query is evaluated. As a consequence, db:create is mainly useful in the context of Command Scripts, or Web Applications, in which a redirect to another page can be triggered after having created a database.

Example

The query…

insert node <b/> into /doc,
for $n in /doc/child::node()
return rename node $n as 'justRenamed'

…applied on the document…

<doc> <a/> </doc>

…results in the following document:

<doc> <justRenamed/><b/> </doc>

Despite explicitly renaming all child nodes of <doc/>, the former <a/> element is the only one to be renamed. The element is inserted within the same snapshot and is therefore not yet visible to the user.

Returning Results

By default, it is not possible to mix different types of expressions in a query result. The outermost expression of a query must either be a collection of updating or non-updating expressions. But there are two ways out:

  • The BaseX-specific update:output() function bridges this gap: it caches the results of its arguments at runtime and returns them after all updates have been processed. The following example performs an update and returns a success message:
update:output("Update successful."), insert node <c/> into doc('factbook')/mondial
  • With the MIXUPDATES option, all updating constraints will be turned off. Returned nodes will be copied before they are modified by updating expressions. An error is raised if items are returned within a transform expression.

If you want to modify nodes in main memory, you can use the transform expression.

Effects

Original Files

In BaseX, all updates are performed on database nodes or in main memory. By default, update operations do not affect the original input file (the info string "Updates are not written back" appears in the query info to indicate this). The following solutions exist to write XML documents and binary resources to disk:

  • Updates on main-memory instances of files that have been retrieved via fn:doc or fn:collection will be propagated back to disk when the WRITEBACK option is turned on. This option can also be activated on command line via -u. Make sure you back up the original documents before running your queries.
  • Functions like fn:put or file:write can be used to write single XML documents to disk. With file:write-binary, you can write binary resources.
  • The EXPORT command can be used write all resources of a databases to disk.

Indexes

Index structures are discarded after update operations when UPDINDEX is turned off (which is the default). More details are found in the article on Indexing.

Error Messages

Along with the Update Facility, a number of new error codes and messages have been added to the specification and BaseX. All errors are listed in the XQuery Errors overview.

Please remember that the collected updates will be executed after the query evaluation. All logical errors will be raised before the updates are actually executed.

Changelog

Version 9.0
Version 8.5
Version 8.0
  • Added: MIXUPDATES option for Returning Results in updating expressions
  • Added: information message if files are not written back
Version 7.8
  • Added: update convenience operator