Difference between revisions of "Indexes"

From BaseX Documentation
Jump to navigation Jump to search
Line 5: Line 5:
 
Currently, the following index structures exist in BaseX:
 
Currently, the following index structures exist in BaseX:
  
;Tag/Attribute Name Index
+
; Tag/Attribute Name Index
:All tags and attribute names are automatically indexed and enriched with statistical information.
+
: All element and attribute names are automatically indexed and enriched with statistical information.
;Path Summary
+
 
:Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.
+
; Path Summary
;Text Index
+
: Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.
:This index speeds up equality tests on text nodes in XPath predicates.
+
 
;Attribute Index
+
; Document Index
:This index speeds up equality tests on attribute value in XPath predicates.
+
: This index caches references to all document nodes in a database. It provides fast access to single documents in large database instances.
;Full-Text Index
+
 
:[[Full-Text]] indexes are very powerful structures, which are mainly used in information retrieval use cases.
+
; Text Index
 +
: This index speeds up equality tests on text nodes in XPath location steps with predicates.
 +
 
 +
; Attribute Index
 +
: This index speeds up equality tests on attribute value in XPath location steps with predicates.
 +
 
 +
; Full-Text Index
 +
: [[Full-Text]] indexes are very powerful structures, which are mainly used in information retrieval use cases.
  
 
With {{Mark|Version 7.1}}, a new database option was introduced to support [[Options_(Snapshot)#UPDINDEX|incremental indexing]] of texts and attributes.
 
With {{Mark|Version 7.1}}, a new database option was introduced to support [[Options_(Snapshot)#UPDINDEX|incremental indexing]] of texts and attributes.

Revision as of 06:04, 7 February 2012

This article is part of the Advanced User's Guide and introduces the available index structures, which may speed up querying by orders of magnitudes.

Index Structures

Currently, the following index structures exist in BaseX:

Tag/Attribute Name Index
All element and attribute names are automatically indexed and enriched with statistical information.
Path Summary
Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.
Document Index
This index caches references to all document nodes in a database. It provides fast access to single documents in large database instances.
Text Index
This index speeds up equality tests on text nodes in XPath location steps with predicates.
Attribute Index
This index speeds up equality tests on attribute value in XPath location steps with predicates.
Full-Text Index
Full-Text indexes are very powerful structures, which are mainly used in information retrieval use cases.

With Template:Mark, a new database option was introduced to support incremental indexing of texts and attributes.

Example Queries

The following queries will be optimized for index access (provided that the relevant index exists in a particular database):

Name/Path Index

  • //name may be rewritten to /addressbook/address/name
  • /non-existing-name may be rewritten to an empty sequence

Text Index

  • //node()[text() = 'Usability']
  • //div[p = 'Usability' or p = 'Testing']
  • path/to/relevant[text() = 'Usability Testing']/and/so/on

Attribute Index

  • //node()[@align = 'right']
  • descendant::elem[@id = '1']
  • range/query[@id >= 1 and @id <= 5]

Full-Text Index

  • //node[text() contains text 'Usability']
  • //node[text() contains text 'Usebiliti' using fuzzy]
  • //book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]

Index data structures

Text/Attribute Index
Both the text and attribute index are based on a balanced B-Tree and support exact matches and range queries.
Full-Text Index (Standard)
The standard full-text index is implemented as sorted array structure. It is optimized for simple and fuzzy searches.
Full-Text Index (Wildcards enabled)
A second full-text index is implemented as a compressed trie. It needs slightly more memory than the standard full-text index, but it supports more features, such as full wildcard search.