Difference between revisions of "Indexes"

From BaseX Documentation
Jump to navigation Jump to search
Line 1: Line 1:
This article is part of the [[Querying|Query Portal]].
+
This article is part of the [[Querying|Query Portal]] and introduces various index structures, which may speed up many queries by orders of magnitudes.
It enumerates the index structures available in BaseX.
 
  
==Existing Indexes==  
+
==Index Structures==  
 
   
 
   
Indexes may speed up queries by orders of magnitudes. Currently, four indexes exist:
+
Currently, the following index structures exist in BaseX:
  
 +
;Tag/Attribute Name Index
 +
:All tags and attribute names are automatically indexed and enriched with statistical information.
 +
;Path Summary
 +
:Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.
 
;Text Index
 
;Text Index
:This index speeds up text comparisons in predicates.
+
:This index speeds up equality tests on text nodes in XPath predicates.
 
;Attribute Index
 
;Attribute Index
:This index speeds up attribute value comparisons in predicates.
+
:This index speeds up equality tests on attribute value in XPath predicates.
 
;Full-Text Index
 
;Full-Text Index
:Full-text queries are sped up by this index.
+
:[[Full-text]] indexes are very powerful structures, which are mainly used in information retrieval use cases.
;Path Summary
 
:This index speeds up the resolution of location paths.
 
  
 
==Examples of using the indexes==  
 
==Examples of using the indexes==  

Revision as of 06:52, 11 December 2011

This article is part of the Query Portal and introduces various index structures, which may speed up many queries by orders of magnitudes.

Index Structures

Currently, the following index structures exist in BaseX:

Tag/Attribute Name Index
All tags and attribute names are automatically indexed and enriched with statistical information.
Path Summary
Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.
Text Index
This index speeds up equality tests on text nodes in XPath predicates.
Attribute Index
This index speeds up equality tests on attribute value in XPath predicates.
Full-Text Index
Full-text indexes are very powerful structures, which are mainly used in information retrieval use cases.

Examples of using the indexes

Here are some examples for queries which are rewritten for index access:

Text-Based Queries

  • //node()[text() = 'Usability']
  • //div[p = 'Usability' or p = 'Testing']
  • path/to/relevant[text() = 'Usability Testing']/and/so/on

Attribute Index

  • //node()[@align = 'right']
  • descendant::elem[@id = '1']
  • range/query[@id >= 1 and @id <= 5]

Full-Text Index

  • //node[text() contains text 'Usability']
  • //node[text() contains text 'Usebiliti' using fuzzy]
  • //book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]

The full-text index is optimized to support all features of the XQuery Full Text Recommendation.

BaseX extends the specification offering a fuzzy match option. Fuzzy search is based on the Levenshtein algorithm; the longer query terms are, the more errors will be tolerated.

Default "Case Sensitivity", "Stemming" and "Diacritics" options will be considered in the index creation. Consequently, all queries will be sped up which use the default index options.

Index data structures

Text/Attribute Index
Both the text and attribute index are based on a balanced B-Tree and support exact matches and range queries.
Full-Text Index (Standard)
The standard full-text index is implemented as sorted array structure. It is optimized for simple and fuzzy searches.
Full-Text Index (Wildcards enabled)
A second full-text index is implemented as a compressed trie. It needs slightly more memory than the standard full-text index, but it supports more features, such as full wildcard search.