Storage Layout

From BaseX Documentation
Revision as of 18:59, 26 October 2011 by CG (talk | contribs)
Jump to navigation Jump to search

Data Types

  • Num: compressed integer (1-5 bytes)
  • Token: length (Num) and bytes of UTF8 byte representation
  • Double: number, stored as token
  • Boolean: boolean (1 byte, 00 or 01)
  • TokenSet: key array (Tokens), next/bucket/size arrays (Nums)
  • Nums, Tokens and Doubles are arrays of values, and introduced with the number of entries (Num)

inf.basex

Contents: Meta information on a database and main memory indexes.

Description Format Method
1. Meta Data Key/value pairs, suffixed by empty key (Token/Token):
PERM → User Permissions
DiskData()
MetaData()
Users()
2. Main memory indexes Key/value pairs, suffixed by empty key (Token/Token):
TAGS → Tag Index
ATTS → Attribute Index
PATH → Path Index
NS → Namespaces
DOCS → Document Index
DiskData()
2.1. Name Index
Tag/attribute names
1. Token set, storing all names (TokenSet)
2. One StatsKey instance per entry:
2.1. Content kind (Num):
2.1.1. Number: min/max (Doubles)
2.1.2. Category: number of entries (Num), entries (Tokens)
2.2. Number of entries (Num)
2.3. Leaf flag (Boolean)
2.4. Maximum text length (Double; legacy, could be Num)
Names()
TokenSet.read()
StatsKey()
2.2. Path Index 1. Flag for path definition (Boolean, always true; legacy)
2. PathNode:
2.1. Name reference (Num)
2.2. Node kind (Num)
2.3. Number of occurrences (Num)
2.4. Number of children (Num)
2.5. Double; legacy, can be reused or discarded
2.6. Recursive generation of child nodes (→ 2)
PathSummary()
PathNode()
2.3. Namespaces 1. Token set, storing prefixes (TokenSet)
2. Token set, storing URIs (TokenSet)
3. NSNode:
3.1. pre value (Num)
3.2. References to prefix/URI pairs (Nums)
3.3. Number of children (Num)
3.4. Recursive generation of child nodes (→ 3)
Namespaces()
NSNode()
2.4. Document Index Array of integers, representing the distances between all document pre values (Nums) DocIndex()

(tbl|tbli).basex

Contents: Main database table and directory.

txt.basex

Contents: Heap file with text values (document names, string values of texts, comments and processing instructions).

atv.basex

Contents: Heap file with attribute values.

(txtl|txtr).basex

Contents: Value index for texts.

(atvl|atvr).basex

Contents: Value index for attributes.

(ftxa|ftxb|ftxc).basex

Contents: Trie full-text index.

(ftxx|ftxy|ftxz).basex

Contents: Fuzzy full-text index.