Changes

Jump to navigation Jump to search
150 bytes removed ,  14:49, 3 August 2015
m
Editorial-type tweaks
This article is part of the [[Advanced User's Guide]] and introduces the available index structures, which are utilized by the query optimizer to rewrite expressions and speed up query evaluation.
Nearly all examples in this article are based on the [http://files.basex.org/xml/factbook.xml factbook.xml] document. To see how a query is rewritten, please turn on the [[GUI#Visualizations|Info View]] in the GUI or use the [[Command-Line Options#BaseX_Standalone|-V flag]] on the command line.
=Structural Indexes=
The path index (also called ''path summary'') stores all distinct paths of the documents in the database. It contains the same statistical information as the name index. The statistics are discarded after database updates and can be recreated with the [[Commands#OPTIMIZE|OPTIMIZE]] command.
The path index is applied to rewrite descendant steps to multiple child steps. Child steps can be evaluated faster, as less fewer nodes have to be accessed:
<pre class="brush:xquery">
(: 2nd example :)
doc('factbook.xml')//name[. = 'Germany'],
(: 3rd st example :)
for $c in db:open('factbook')//country
where $c//city/name = 'Hanoi'
The [[Full-Text]] index speeds up queries using the {{Code|contains text}} expression. Internally, two index structures are provided: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches. Both index structures will be merged in a future version of BaseX.
 
The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database):
If the full-text index exists, the following queries will all be rewritten for index access:
If main memory runs out while creating a value index, the currently generated index structures will be partially written to disk and eventually merged. If the used memory heuristics fails for some reason (i.e., because multiple index operations run at the same time), fixed index split sizes may be chosen via the [[Options#INDEXSPLITSIZE|INDEXSPLITSIZE]] and [[Options#FTINDEXSPLITSIZE|FTINDEXSPLITSIZE]] options.
If [[Options#DEBUG|DEBUG]] is set to true, and if a new database is created from the command-line, the number of index operations will be output to standard output; this might help you to choose a proper split size. The following example shows how the output can look like for a document with 111 MB and 128 MB of available main memory:
<pre>
</pre>
The info string {{Code|3 splits}} indicates that three partial full-text index structures were written to disk, and the string {{Code|12089347 operations}} tells that the index construction consisted of appr. approximately 12 mio index operations. If we set [[Options#FTINDEXSPLITSIZE|FTINDEXSPLITSIZE]] to the fixed value {{Code|4000000}} (12 mio divided by three), or a smaller value, we should be able to build the index and circumvent the memory heuristics.
=Updates=
when requested by the database users.
With the [[Options#UPDINDEX|UPDINDEX]] option, text and attributesattribute
index structures will incrementally be updated. This option must be
turned on before the database is created or optimized.
administrator, editor
43

edits

Navigation menu