Difference between revisions of "XQuery Update"
Line 212: | Line 212: | ||
===Original Files=== | ===Original Files=== | ||
− | In BaseX, all updates are performed on database nodes or in main memory. | + | In BaseX, all updates are performed on database nodes or in main memory. By default, update operations never affect the original input file. The following solutions exist to write XML documents and binary resources to disk: |
− | + | * The [[Commands#EXPORT|EXPORT]] command can be used write all resources of a databases to disk. | |
+ | * Functions like <code>[[#fn:put|fn:put]]</code> or <code>[[File Module#file:write|file:write]]</code> can be used to write single XML documents to disk. With <code>[[File Module#file:write-binary|file:write-binary]]</code>, you can write binary resources. | ||
+ | * Updates on main-memory instances of files that have been retrieved via {{Code|fn:doc}} or {{Code|fn:collection}} will be propagated to disk when the <code>[[Options#WRITEBACK|WRITEBACK]]</code> option is turned on. This option can also be activated on [[Command-Line Options#BaseX Standalone|command line]] via <code>-u</code>. Make sure you back up the original documents before running your queries. | ||
===Indexes=== | ===Indexes=== | ||
− | + | Index structures are discarded after update operations when [[Options#UPDINDEX|UPDINDEX]] is turned off (which is the default). | |
− | + | More details are found in the article on [[Index#Updates|Indexing]]. | |
− | |||
==Error Messages== | ==Error Messages== |
Revision as of 12:47, 9 November 2013
This article is part of the XQuery Portal. It summarizes the update features of BaseX.
BaseX offers a complete implementation of the XQuery Update Facility (XQUF). This article aims to provide a very quick and basic introduction to the XQUF. First, some examples for update expressions are given. After that, a few problems are addressed that frequently arise due to the nature of the language. These are stated in the Concepts paragraph.
Contents
Features
Updating Expressions
There are five new expressions to modify data. While insert
, delete
, rename
and replace
are basically self-explanatory, the transform
expression is different, as modified nodes are copied in advance and the original databases remain untouched.
An expression consists of a target node (the node we want to alter) and additional information like insertion nodes, a QName, etc. which depends on the type of expression. Optional modifiers are available for some of them. You can find a few examples and additional information below.
insert
insert node (attribute { 'a' } { 5 }, 'text', <e/>) into /n
Insert enables you to insert a sequence of nodes into a single target node. Several modifiers are available to specify the exact insert location: insert into as first/as last, insert before/after and insert into.
Note: in most cases, as last and after will be evaluated faster than as first and before!
delete
delete node //node
The example query deletes all <node>
elements in your database. Note that, in contrast to other updating expressions, the delete expression allows multiple nodes as a target.
replace
replace node /n with <a/>
The target element is replaced by the DOM node <a/>
. You can also replace the value of a node or its descendants by using the modifier value of.
replace value of node /n with 'newValue'
All descendants of /n are deleted and the given text is inserted as the only child. Note that the result of the insert sequence is either a single text node or an empty sequence. If the insert sequence is empty, all descendants of the target are deleted. Consequently, replacing the value of a node leaves the target with either a single text node or no descendants at all.
rename
for $n in //node return rename node $n as 'renamedNode'
All node elements are renamed. An iterative approach helps to modify multiple nodes within a single statement. Nodes on the descendant- or attribute-axis of the target are not affected. This has to be done explicitly as well.
Non-Updating Expressions
transform
copy $c := doc('example.xml')//node[@id = 1] modify rename node $c as 'copyOfNode' return $c
The node element with @id=1
is copied and subsequently assigned a new QName using the rename expression. Note that the transform expression is the only expression which returns an actual XDM instance as a result. You can therefore use it to modify results and especially DOM nodes. This is an issue beginners are often confronted with. More on this topic can be found in the XQUF Concepts section.
The following example demonstrates a common use case:
Query:
copy $c := <entry> <title>Transform expression example</title> <author>BaseX Team</author> </entry> modify ( replace value of node $c/author with 'BaseX', replace value of node $c/title with concat('Copy of: ', $c/title), insert node <author>Joey</author> into $c ) return $c
Result:
<entry> <text>Copy of: Transform expression example</text> <author>BaseX</author> <author>Joey</author> </entry>
The <entry>
element (here it is passed to the expression as a DOM node) can also be replaced by a database node, e.g.:
copy $c := (db:open('example')//entry)[1] ...
In this case, the original database node remains untouched as well, as all updates are performed on the node copy.
Here is an example where we return an entire document, parts modified and all:
copy $c := doc("zaokeng.kml") modify ( for $d in $c//*:Point return insert node ( <extrude>1</extrude>, <altitudeMode>relativeToGround</altitudeMode> ) before $d/*:coordinates ) return $c
Functions
fn:put
fn:put()
is also part of the XQUF and enables the user to serialize XDM instances to secondary storage. It is executed at the end of a snapshot. Serialized documents therefore reflect all changes made effective during a query.
Database Functions
Some additional, updating database functions exist in order to perform updates on document and database level.
Concepts
There are a few specialties around XQuery Update that you should know about. In addition to the simple expression, the XQUF adds the updating expression as a new type of expression. An updating expression returns only a Pending Update List (PUL) as a result which is subsequently applied to addressed databases and DOM nodes. A simple expression cannot perform any permanent changes and returns an empty or non-empty sequence.
Pending Update List
The most important thing to keep in mind when using XQuery Update is the Pending Update List (PUL). Updating statements are not executed immediately, but are first collected as update primitives within a set-like structure. At the end of a query, after some consistency checks and optimizations, the update primitives will be applied in the following order:
insert insert into insert into last insert attribute insert into first replace value rename put replace delete insert before db:add() db:store() db:replace() db:rename() db:delete() db:optimize() db:flush() db:drop() db:create()
If an inconsistency is found, an error message is returned and all accessed databases remain untouched (atomicity). For the user, this means that updates are only visible after the end of a snapshot.
It may be surprising to see db:create
on bottom of this list. This means that newly created database cannot be accessed by the same query, which can be explained by the semantics of updating queries: all expressions can only be evaluated on databases that already exist while compiling and evaluating the query. As a result, db:create
is mainly useful in the context of Command Scripts, or Web Applications, in which a redirect to another page can be triggered after having created a database.
Example
The query…
insert node <b/> into /doc, for $n in /doc/child::node() return rename node $n as 'justRenamed'
…applied on the document…
<doc> <a/> </doc>
…results in the following document:
<doc> <justRenamed/><b/> </doc>
Despite explicitly renaming all child nodes of <doc/>
, the former <a/>
element is the only one to be renamed. The element is inserted within the same snapshot and is therefore not yet visible to the user.
Returning Results
It is not possible to mix different types of expressions in a query result. The outermost expression of a query must either be a collection of updating or non-updating expressions. The only way to perform any updating queries and return a result at the same time is to use the BaseX-specific db:output()
function, which caches the results of its arguments at runtime and returns them after all updates have been processed.
Example: Perform update and return success message.
db:output("Update successful."), insert node <c/> into doc('factbook')/mondial
If you want to modify temporary nodes in main memory without storing them in a database, you can use the transform expression.
Function Declaration
To use updating expressions within a function, the %updating
annotation has to be added to the function declaration. A correct declaration of a function that contains updating expressions (or one that calls updating functions) looks like this:
declare %updating function { ... }
Effects
Original Files
In BaseX, all updates are performed on database nodes or in main memory. By default, update operations never affect the original input file. The following solutions exist to write XML documents and binary resources to disk:
- The EXPORT command can be used write all resources of a databases to disk.
- Functions like
fn:put
orfile:write
can be used to write single XML documents to disk. Withfile:write-binary
, you can write binary resources. - Updates on main-memory instances of files that have been retrieved via
fn:doc
orfn:collection
will be propagated to disk when theWRITEBACK
option is turned on. This option can also be activated on command line via-u
. Make sure you back up the original documents before running your queries.
Indexes
Index structures are discarded after update operations when UPDINDEX is turned off (which is the default). More details are found in the article on Indexing.
Error Messages
Along with the Update Facility, a number of new error codes and messages have been added to the specification and BaseX. All errors are listed in the XQuery Errors overview.