Changes

Indexes (edit)

Revision as of 15:49, 3 August 2015

150 bytes removed , 15:49, 3 August 2015

m

Editorial-type tweaks

This article is part of the [[Advanced User's Guide]] and introduces the available index structures, which are utilized by the query optimizer to rewrite expressions and speed up query evaluation.

Nearly all examples in this article are based on the [http://files.basex.org/xml/factbook.xml factbook.xml] document. To see how a query is rewritten, please turn on the [[GUI#Visualizations|Info View]] in the GUI or use the [[Command-Line Options#BaseX_Standalone|-V flag]] on the command line.

=Structural Indexes=

The path index (also called ''path summary'') stores all distinct paths of the documents in the database. It contains the same statistical information as the name index. The statistics are discarded after database updates and can be recreated with the [[Commands#OPTIMIZE|OPTIMIZE]] command.

The path index is applied to rewrite descendant steps to multiple child steps. Child steps can be evaluated faster, as ~~less~~ fewer nodes have to be accessed:

(: 2nd example :)

doc('factbook.xml')//name[. = 'Germany'],

(: 3rd st example :)

for $c in db:open('factbook')//country

where $c//city/name = 'Hanoi'

The [[Full-Text]] index speeds up queries using the {{Code|contains text}} expression. Internally, two index structures are provided: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches. Both index structures will be merged in a future version of BaseX.

~~The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database):~~

If the full-text index exists, the following queries will all be rewritten for index access:

If main memory runs out while creating a value index, the currently generated index structures will be partially written to disk and eventually merged. If the used memory heuristics fails for some reason (i.e., because multiple index operations run at the same time), fixed index split sizes may be chosen via the [[Options#INDEXSPLITSIZE|INDEXSPLITSIZE]] and [[Options#FTINDEXSPLITSIZE|FTINDEXSPLITSIZE]] options.

If [[Options#DEBUG|DEBUG]] is set to true, and if a new database is created from the command-line, the number of index operations will be output to standard output; this might help you to choose a proper split size. The following example shows how the output can look ~~like~~ for a document with 111 MB and 128 MB of available main memory:

<pre>

</pre>

The info string {{Code|3 splits}} indicates that three partial full-text index structures were written to disk, and the string {{Code|12089347 operations}} tells that the index construction consisted of ~~appr.~~ approximately 12 mio index operations. If we set [[Options#FTINDEXSPLITSIZE|FTINDEXSPLITSIZE]] to the fixed value {{Code|4000000}} (12 mio divided by three), or a smaller value, we should be able to build the index and circumvent the memory heuristics.

=Updates=

when requested by the database users.

With the [[Options#UPDINDEX|UPDINDEX]] option, text and ~~attributes~~attribute

index structures will incrementally be updated. This option must be

turned on before the database is created or optimized.

Amanda

administrator, editor

43

edits

Changes

Indexes (edit)

Revision as of 15:49, 3 August 2015

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools