Difference between revisions of "Indexes"

Revision as of 03:57, 23 March 2012

This article is part of the Advanced User's Guide and introduces the available index structures, which may speed up querying by orders of magnitudes.

Index Structures

Currently, the following index structures exist in BaseX:

Structural Indexes

Structural indexes will always be present and cannot be dropped by the user:

Tag/Attribute Name Index

All element and attribute names are automatically indexed and enriched with statistical information.

Path Summary

Unique paths in a document or collection are referenced by the path index, which is applied e.g. to rewrite descendant to more specific child steps.

Document Index

This index caches references to all document nodes in a database. It provides fast access to single documents in large database instances.

Value Indexes

Value indexes can be dropped and created by the user:

Text Index

This index speeds up equality tests and simple range queries on text nodes in XPath location steps with predicates.

Attribute Index

This index speeds up equality tests and simple range queries on attribute value in XPath location steps with predicates.

Full-Text Index

This index speeds up queries using the contains text keyword. Internally, BaseX handles two different index structures: the default index sorts all keys alphabetically by their character length. It is particularly fast if fuzzy searches are performed. The second index is a compressed trie structure, which needs slightly more memory, but is specialized on wildcard searches.

With Template:Mark, a new database option was introduced to support incremental indexing of texts and attributes.

Example Queries

The following queries are examples for expressions that will be optimized for index access (provided that the relevant index exists in a particular database):

Name/Path Index

//address is rewritten to addressbook/address if all address elements have an addressbook element as their only ancestor.
/non-existing-name is rewritten to an empty sequence

Text Index

//node()[text() = 'Usability']
//div[p = 'Usability' or p = 'Testing']
path/to/relevant[text() = 'Usability Testing']/and/so/on

Attribute Index

//node()[@align = 'right']
descendant::elem[@id = '1']
range/query[@id >= 1 and @id <= 5]

Full-Text Index

//node[text() contains text 'Usability']
//node[text() contains text 'Usebiliti' using fuzzy]
//book[chapter contains text ('web' ftor 'WWW' using no stemming) ftand 'diversity' using stemming distance at most 5 words]

@@ Line 38: / Line 38: @@
 ===Name/Path Index===
-* {{Mono|//address}} is rewritten to {{Mono/addressbook/address}} if all {{Mono|address}} elements have an {{Mono|addressbook}} element as their only ancestor.
+* {{Mono|//address}} is rewritten to {{Mono|addressbook/address}} if all {{Mono|address}} elements have an {{Mono|addressbook}} element as their only ancestor.
 * {{Mono|/non-existing-name}} is rewritten to an empty sequence

Difference between revisions of "Indexes"

Revision as of 03:57, 23 March 2012

Contents

Index Structures

Structural Indexes

Value Indexes

Example Queries

Name/Path Index

Text Index

Attribute Index

Full-Text Index

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools