Difference between revisions of "Indexes"
Line 1: | Line 1: | ||
− | This article is part of the [[Advanced User's Guide]] and introduces the available index structures, which | + | This article is part of the [[Advanced User's Guide]] and introduces the available index structures, which are utilized by the query optimizer to rewrite expressions and speed up query evaluation. |
− | + | The examples in this article are based on the [http://files.basex.org/xml/factbook.xml factbook.xml] document. To see how a query is rewritten, please turn on the [[GUI#Visualizations|Info View]] in the GUI or use the [[Startup_Options#BaseX_Standalone|-V flag]] on command line. | |
− | |||
− | |||
− | + | =Structural Indexes= | |
Structural indexes will always be present and cannot be dropped by the user: | Structural indexes will always be present and cannot be dropped by the user: | ||
− | + | ==Name Index== | |
− | |||
− | + | The name index contains all element and attribute names of a database, and the fixed-size index ids are stored in the main database table. If a database is updated, new names are automatically added. The index is further enriched with statistical information, which will get out-of-dated after new updates. | |
− | |||
− | + | The name index is applied to pre-evaluate location steps that will never yield results: | |
− | |||
− | = | + | <pre class="brush:xquery"> |
+ | (: will be rewritten to an empty sequence :) | ||
+ | /non-existing-name | ||
+ | </pre> | ||
− | + | ==Path Index== | |
− | + | The path index (also called ''path summary'') stores all distinct paths of the documents in the database. It also contains additional statistical information. Currently, the index will get out-of-dated after new updates. | |
− | |||
− | + | The path index is applied to rewrite descendant steps to multiple child steps. Child steps can be evaluated faster, as less nodes have to be accessed: | |
− | |||
− | + | <pre class="brush:xquery"> | |
− | : | + | doc('factbook.xml')//province, |
+ | (: ...will be rewritten to... :) | ||
+ | doc('factbook.xml')/mondial/country/province | ||
+ | </pre> | ||
− | + | The paths statistics are e.g. used to pre-evaluate the {{Mono|count()}} function: | |
+ | |||
+ | <pre class="brush:xquery"> | ||
+ | (: will be rewritten and pre-evaluated by the path index :) | ||
+ | count( doc('factbook')//country ) | ||
+ | </pre> | ||
+ | |||
+ | ==Document Index== | ||
+ | |||
+ | The document index contains references to the {{Mono|pre}} values of all document nodes. It speeds up the access to specific documents in a database, and it will be automatically updated when updates are performed. | ||
+ | |||
+ | The following query will be sped up by the document index: | ||
+ | |||
+ | <pre class="brush:xquery"> | ||
+ | db:open('DatabaseWithLotsOfDocuments') | ||
+ | </pre> | ||
+ | |||
+ | =Value Indexes= | ||
+ | |||
+ | Value indexes can be optionally created and dropped by the user. The text and attribute index will be created by default: | ||
+ | |||
+ | ==Text Index== | ||
+ | |||
+ | This index speeds up string-based equality tests and (since {{Version|7.2.1}}) range queries on text nodes. The [[Options#UPDINDEX|UPDINDEX]] option can be activated to keep this index up-to-date. | ||
+ | |||
+ | The following queries will all be rewritten for index access: | ||
+ | |||
+ | <pre class="brush:xquery"> | ||
+ | //*[text() = 'Germany'], | ||
+ | doc('factbook.xml')//name[. = 'Germany'], | ||
+ | db:open('factbook')//country[.//city/name = 'Hanoi']/name | ||
+ | </pre> | ||
+ | |||
+ | ==Attribute Index== | ||
+ | |||
+ | Similar to the text index, this index speeds up string-based tests on attribute values. The [[Options#UPDINDEX|UPDINDEX]] option can be activated to keep this index up-to-date. | ||
+ | |||
+ | The following queries will all be rewritten for index access: | ||
+ | |||
+ | <pre class="brush:xquery"> | ||
+ | //country[@car_code = 'J'], | ||
+ | //province[@* = 'Hokkaido']//name, | ||
+ | //sea[@depth > '2100' and @depth < '4000'] | ||
+ | </pre> | ||
+ | |||
+ | ==Full-Text Index== | ||
+ | |||
+ | The [[Full-Text]] index speeds up queries using the {{Mono|contains text}} expression. Internally, two index structures are provided: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches. Both index structures will be merged in a future version of BaseX. | ||
− | |||
− | |||
The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database): | The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database): | ||
− | |||
− | |||
− | |||
− | |||
− | + | If the full-text index exists, the following queries will all be rewritten for index access: | |
− | |||
− | |||
− | |||
− | + | <pre class="brush:xquery"> | |
− | + | //country/name[text() contains text 'and'], | |
− | + | //religions[. contains text { 'Catholic', 'Roman' } | |
− | + | using case insensitive distance at most 2 words] | |
− | + | </pre> | |
− | |||
− | |||
− | |||
− | |||
[[Category:Internals]] | [[Category:Internals]] |
Revision as of 15:10, 17 April 2012
This article is part of the Advanced User's Guide and introduces the available index structures, which are utilized by the query optimizer to rewrite expressions and speed up query evaluation.
The examples in this article are based on the factbook.xml document. To see how a query is rewritten, please turn on the Info View in the GUI or use the -V flag on command line.
Contents
Structural Indexes
Structural indexes will always be present and cannot be dropped by the user:
Name Index
The name index contains all element and attribute names of a database, and the fixed-size index ids are stored in the main database table. If a database is updated, new names are automatically added. The index is further enriched with statistical information, which will get out-of-dated after new updates.
The name index is applied to pre-evaluate location steps that will never yield results:
(: will be rewritten to an empty sequence :) /non-existing-name
Path Index
The path index (also called path summary) stores all distinct paths of the documents in the database. It also contains additional statistical information. Currently, the index will get out-of-dated after new updates.
The path index is applied to rewrite descendant steps to multiple child steps. Child steps can be evaluated faster, as less nodes have to be accessed:
doc('factbook.xml')//province, (: ...will be rewritten to... :) doc('factbook.xml')/mondial/country/province
The paths statistics are e.g. used to pre-evaluate the count()
function:
(: will be rewritten and pre-evaluated by the path index :) count( doc('factbook')//country )
Document Index
The document index contains references to the pre
values of all document nodes. It speeds up the access to specific documents in a database, and it will be automatically updated when updates are performed.
The following query will be sped up by the document index:
db:open('DatabaseWithLotsOfDocuments')
Value Indexes
Value indexes can be optionally created and dropped by the user. The text and attribute index will be created by default:
Text Index
This index speeds up string-based equality tests and (since Version 7.2.1) range queries on text nodes. The UPDINDEX option can be activated to keep this index up-to-date.
The following queries will all be rewritten for index access:
//*[text() = 'Germany'], doc('factbook.xml')//name[. = 'Germany'], db:open('factbook')//country[.//city/name = 'Hanoi']/name
Attribute Index
Similar to the text index, this index speeds up string-based tests on attribute values. The UPDINDEX option can be activated to keep this index up-to-date.
The following queries will all be rewritten for index access:
//country[@car_code = 'J'], //province[@* = 'Hokkaido']//name, //sea[@depth > '2100' and @depth < '4000']
Full-Text Index
The Full-Text index speeds up queries using the contains text
expression. Internally, two index structures are provided: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches. Both index structures will be merged in a future version of BaseX.
The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database):
If the full-text index exists, the following queries will all be rewritten for index access:
//country/name[text() contains text 'and'], //religions[. contains text { 'Catholic', 'Roman' } using case insensitive distance at most 2 words]