Difference between revisions of "Indexes"
Jump to navigation
Jump to search
Line 15: | Line 15: | ||
; Text Index | ; Text Index | ||
− | : This index speeds up equality tests on text nodes in XPath location steps with predicates. | + | : This index speeds up equality tests and simple range queries on text nodes in XPath location steps with predicates. |
; Attribute Index | ; Attribute Index | ||
− | : This index speeds up equality tests on attribute value in XPath location steps with predicates. | + | : This index speeds up equality tests and simple range queries on attribute value in XPath location steps with predicates. |
− | ; Full-Text | + | ; [[Full-Text|Full-Text Index]] |
− | + | : This index speeds up queries using the {{Mono|contains text}} keyword. Internally, BaseX handles two different index structures: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches. | |
With {{Mark|Version 7.1}}, a new database option was introduced to support [[Options_(Snapshot)#UPDINDEX|incremental indexing]] of texts and attributes. | With {{Mark|Version 7.1}}, a new database option was introduced to support [[Options_(Snapshot)#UPDINDEX|incremental indexing]] of texts and attributes. | ||
Line 27: | Line 27: | ||
==Example Queries== | ==Example Queries== | ||
− | The following queries will be optimized for index access (provided that the relevant index exists in a particular database): | + | The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database): |
===Name/Path Index=== | ===Name/Path Index=== | ||
Line 47: | Line 47: | ||
* <code>//node[text() contains text 'Usebiliti' using fuzzy]</code> | * <code>//node[text() contains text 'Usebiliti' using fuzzy]</code> | ||
* <code>//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]</code> | * <code>//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]</code> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
[[Category:Internals]] [[Category:XQuery]] | [[Category:Internals]] [[Category:XQuery]] |
Revision as of 06:10, 7 February 2012
This article is part of the Advanced User's Guide and introduces the available index structures, which may speed up querying by orders of magnitudes.
Contents
Index Structures
Currently, the following index structures exist in BaseX:
- Tag/Attribute Name Index
- All element and attribute names are automatically indexed and enriched with statistical information.
- Path Summary
- Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.
- Document Index
- This index caches references to all document nodes in a database. It provides fast access to single documents in large database instances.
- Text Index
- This index speeds up equality tests and simple range queries on text nodes in XPath location steps with predicates.
- Attribute Index
- This index speeds up equality tests and simple range queries on attribute value in XPath location steps with predicates.
- Full-Text Index
- This index speeds up queries using the
contains text
keyword. Internally, BaseX handles two different index structures: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches.
With Template:Mark, a new database option was introduced to support incremental indexing of texts and attributes.
Example Queries
The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database):
Name/Path Index
//name
may be rewritten to/addressbook/address/name
/non-existing-name
may be rewritten to an empty sequence
Text Index
//node()[text() = 'Usability']
//div[p = 'Usability' or p = 'Testing']
path/to/relevant[text() = 'Usability Testing']/and/so/on
Attribute Index
//node()[@align = 'right']
descendant::elem[@id = '1']
range/query[@id >= 1 and @id <= 5]
Full-Text Index
//node[text() contains text 'Usability']
//node[text() contains text 'Usebiliti' using fuzzy]
//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]