Changes

Indexes (edit)

Revision as of 15:46, 25 July 2022

17 bytes removed , 15:46, 25 July 2022

no edit summary

This article is part of the [[XQuery|XQuery Portal]]. It contains information on the available index structures.

The query compiler tries to optimize and speed up queries by applying the index whenever it is possible , and seems promising. To see how a query is rewritten, and if an index is used, you can turn on the [[GUI#Visualizations|Info View]] in the GUI or use the [[Command-Line Options#BaseX_Standalone|-V flag]] on the command line:

* A message like <code>apply text index for "Japan"</code> indicates that the text index is applied to speed up the search of the shown string. The following message…

* <code>no index results</code> indicates that a string in a path expression will never yield results. ~~Because of that~~Hence, the path does not need to be evaluated at all.

* If you cannot find any index optimization hints in the info output, it often helps if you rewrite and simplify your query.

=Value Indexes=

Value indexes can be created and dropped by the user. Four types of values indexes are available: a text and attribute index, and an optional token and full-text index. By default, the text , and attribute index will automatically be created.

In the GUI, index structures can be managed in the dialog windows for creating new databases or displaying the database properties. On command-line, the commands {{Command|CREATE INDEX}} and {{Command|DROP INDEX}} are used to create and drop index structures. With {{Command|INFO INDEX}}, you get some insight into the contents of an index structure, and {{Command|SET}} allows you to change the index defaults for new databases:

==Token Index==

In many XML dialects, such as HTML or DITA, multiple tokens are stored in attribute values. The token index can be created to speed up the retrieval of these tokens. The XQuery functions {{Code|fn:contains-token}}, {{Code|fn:tokenize}} and {{Code|fn:idref}} are rewritten for index access whenever possible. If a token index exists, it will , e.g. , be utilized for the following queries:

==Full-Text Index==

The [[Full-Text]] index contains the normalized tokens of text nodes of a document. It is utilized to speed up queries with the {{Code|contains text}} expression, and it is capable of processing wildcard and fuzzy search operations. Three evaluation strategies are available: the standard sequential database scan, a full-text index -based evaluation and a hybrid one, combining both strategies (see [https://files.basex.org/publications/Gruen%20et%20al.%20%5B2009%5D,%20XQuery%20Full%20Text%20Implementation%20in%20BaseX.pdf XQuery Full Text implementation in BaseX]).

If the full-text index exists, the following queries will all be rewritten for index access:

</syntaxhighlight>

With {{Command|CREATE INDEX}} and {{Function|Database|db:optimize}}, new selective indexing options will ba be applied to an existing database.

==Enforce Rewritings==

In various cases, existing index structures will not be utilized by the query optimizer. This is usually the case if the name of the database is not a static string (e.g., because it is bound to a variable or passed on as an argument of a function call). Furthermore, several candidates for index rewritings may exist, and the query optimizer may decide for a rewriting that turns out to be suboptimal.

With the {{Option|ENFORCEINDEX}} option, certain index rewritings can be enforced. While the option can be globally enabled, it is usually better to supply it as [[XQuery Extensions#Pragmas|Pragma]]. Two examples:

</syntaxhighlight>

The option can also be assigned to predicates with dynamic values. In the following example , the comparison of the first comparison will be rewritten for index access. Without the pragma expression, the second comparison is preferred and chosen for the rewriting, because the statically known string allows for an exact cost estimation:

=Custom Index Structures=

With XQuery, it is comparatively easy to create your own, custom index structures. The following query ~~demonstrate~~ demonstrates how you can create a {{Code|factbook-index}} database, which contains all texts of the original database in lower case:

=Performance=

If main memory runs out while creating a value index, the current index structures will be partially written to disk and eventually merged. If the memory heuristics fail for some reason (i.e., because multiple index operations run at the same time, or because the applied JVM does not support explicit garbage collections), a fixed index split sizes may be chosen via the {{Option|SPLITSIZE}} option.

If {{Option|DEBUG}} is enabled, the command-line output might help you to find a good split size. The following example shows the output for creating a database for an XMark document with 1 GB, and with 128 MB assigned to the JVM:

<pre>

* The vertical bar <code>|</code> indicates that a partial index structure was written to disk.

* The mean value of the recommendations can be assigned to the {{Option|SPLITSIZE}} option. Please note that the recommendation is only a vague proposal, so try different values if you get main-of-memory errors or indexing gets too slow. Greater values will require more main memory.

* In the example, the full-text index was split 12 times. 116 million tokens were indexed, processing time was 2,.5 minutes, and final main memory consumption (after writing the index to disk) was 76 MB. A good value for the split size option could be {{Code|15}}.

=Updates=

* After the execution of one or more update operations, the {{Command|OPTIMIZE}} command or the {{Function|Database|db:optimize}} function can be called to rebuild the index structures.

* The {{Option|UPDINDEX}} option can be activated before creating or optimizing the database. As a result, the text, attribute and token indexes will be incrementally updated after each database update. Please note that incremental updates are not available for the full-text index and database statistics. This is also explains why the UPTODATE flag, which is e.g. displayed via {{Command|INFO DB}} or {{Function|Database|db:info}}, will be set to {{Code|false}} until the database will be optimized again (various optimizations won’t be triggered. For example, count(//item) can be extremely fast if all ~~meta data~~ metadata is up-to-date.

* The {{Option|AUTOOPTIMIZE}} option can be enabled before creating or optimizing the database. All outdated index structures and statistics will then be recreated after each database update. This option should only be done for small and medium-sized databases.

* Both options can be used side by side: {{Option|UPDINDEX}} will take care that the value index structures will be updated as part of the actual update operation. {{Option|AUTOOPTIMIZE}} will update the remaining data structures (full-text index, database statistics).

;Version 9.1

* Updated: [[#Enforce Rewritings|Enforce Rewritings]], support for comparisons with dynamic values.

;Version 9.0

* Added: [[#Enforce Rewritings|Enforce Rewritings]]

;Version 8.4

* Updated: [[#Name Index|Name Index]], [[#Path Index|Path Index]]

;Version 8.4

* Added: [[#Token Index|Token Index]]

;Version 8.3

* Added: [[#Selective Indexing|Selective Indexing]]

;Version 8.0

* Added: AUTOOPTIMIZE option

;Version 7.2.1

* Added: string-based range queries

CG

Bureaucrats, editor, reviewer, Administrators

13,550

edits

Changes

Indexes (edit)

Revision as of 15:46, 25 July 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools