Difference between revisions of "Indexes"
Jump to navigation
Jump to search
m ((typo) [I'm on a roll today]) |
|||
Line 1: | Line 1: | ||
+ | This article is part of the [[Querying|Query Portal]]. | ||
+ | It enumerates the index structures available in BaseX. | ||
+ | |||
==Existing Indexes== | ==Existing Indexes== | ||
− | + | Indexes may speed up queries by orders of magnitudes. Currently, four indexes exist: | |
− | Currently, four indexes exist: | + | |
;Text Index | ;Text Index | ||
:This index speeds up text comparisons in predicates. | :This index speeds up text comparisons in predicates. | ||
Line 14: | Line 17: | ||
==Examples of using the indexes== | ==Examples of using the indexes== | ||
− | + | Here are some examples for queries which are rewritten for index access: | |
===Text-Based Queries=== | ===Text-Based Queries=== | ||
− | *<code>//node()[text() = 'Usability']</code> | + | * <code>//node()[text() = 'Usability']</code> |
− | *<code>//div[p = 'Usability' or p = 'Testing']</code> | + | * <code>//div[p = 'Usability' or p = 'Testing']</code> |
− | *<code>path/to/relevant[text() = 'Usability Testing']/and/so/on</code> | + | * <code>path/to/relevant[text() = 'Usability Testing']/and/so/on</code> |
===Attribute Index=== | ===Attribute Index=== | ||
− | *<code>//node()[@align = 'right']</code> | + | * <code>//node()[@align = 'right']</code> |
− | *<code>descendant::elem[@id = '1']</code> | + | * <code>descendant::elem[@id = '1']</code> |
− | *<code>range/query[@id >= 1 and @id <= 5]</code> | + | * <code>range/query[@id >= 1 and @id <= 5]</code> |
===Full-Text Index=== | ===Full-Text Index=== | ||
− | *<code>//node[text() contains text 'Usability']</code> | + | * <code>//node[text() contains text 'Usability']</code> |
− | *<code>//node[text() contains text 'Usebiliti' using fuzzy]</code> | + | * <code>//node[text() contains text 'Usebiliti' using fuzzy]</code> |
− | *<code>//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]</code> | + | * <code>//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]</code> |
− | + | The [[Full-Text|full-text]] index is optimized to support all features of the XQuery Full Text | |
− | Recommendation. | + | Recommendation. |
− | + | ||
+ | BaseX extends the specification offering a fuzzy match option. | ||
Fuzzy search is based on the Levenshtein algorithm; the longer | Fuzzy search is based on the Levenshtein algorithm; the longer | ||
− | query terms are, the more errors will be tolerated. | + | query terms are, the more errors will be tolerated. |
− | + | ||
+ | Default "Case Sensitivity", "Stemming" and "Diacritics" options | ||
will be considered in the index creation. Consequently, all queries | will be considered in the index creation. Consequently, all queries | ||
− | will be sped up which use the default index options. | + | will be sped up which use the default index options. |
==Index data structures== | ==Index data structures== | ||
Line 48: | Line 53: | ||
;Full-Text Index (Wildcards enabled) | ;Full-Text Index (Wildcards enabled) | ||
:A second full-text index is implemented as a compressed trie. It needs slightly more memory than the standard full-text index, but it supports more features, such as full wildcard search. | :A second full-text index is implemented as a compressed trie. It needs slightly more memory than the standard full-text index, but it supports more features, such as full wildcard search. | ||
+ | |||
[[Category:Internal]] | [[Category:Internal]] | ||
[[Category:XQuery]] | [[Category:XQuery]] |
Revision as of 07:05, 22 September 2011
This article is part of the Query Portal. It enumerates the index structures available in BaseX.
Contents
Existing Indexes
Indexes may speed up queries by orders of magnitudes. Currently, four indexes exist:
- Text Index
- This index speeds up text comparisons in predicates.
- Attribute Index
- This index speeds up attribute value comparisons in predicates.
- Full-Text Index
- Full-text queries are sped up by this index.
- Path Summary
- This index speeds up the resolution of location paths.
Examples of using the indexes
Here are some examples for queries which are rewritten for index access:
Text-Based Queries
//node()[text() = 'Usability']
//div[p = 'Usability' or p = 'Testing']
path/to/relevant[text() = 'Usability Testing']/and/so/on
Attribute Index
//node()[@align = 'right']
descendant::elem[@id = '1']
range/query[@id >= 1 and @id <= 5]
Full-Text Index
//node[text() contains text 'Usability']
//node[text() contains text 'Usebiliti' using fuzzy]
//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]
The full-text index is optimized to support all features of the XQuery Full Text Recommendation.
BaseX extends the specification offering a fuzzy match option. Fuzzy search is based on the Levenshtein algorithm; the longer query terms are, the more errors will be tolerated.
Default "Case Sensitivity", "Stemming" and "Diacritics" options will be considered in the index creation. Consequently, all queries will be sped up which use the default index options.
Index data structures
- Text/Attribute Index
- Both the text and attribute index are based on a balanced B-Tree and support exact matches and range queries.
- Full-Text Index (Standard)
- The standard full-text index is implemented as sorted array structure. It is optimized for simple and fuzzy searches.
- Full-Text Index (Wildcards enabled)
- A second full-text index is implemented as a compressed trie. It needs slightly more memory than the standard full-text index, but it supports more features, such as full wildcard search.