Difference between revisions of "Storage Layout"

From BaseX Documentation
Jump to navigation Jump to search
Line 18: Line 18:
 
|-
 
|-
 
| valign='top' | '''1. Meta Data'''
 
| valign='top' | '''1. Meta Data'''
| valign='top' | Key/value pairs, suffixed by empty key ({{Type|Token}}/{{Type|Token}}):<br />&bull; <code>PERM</code> → User Permissions
+
| valign='top' | 1. Key/value pairs ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; <code>PERM</code> → Number of users ({{Type|Num}}), and name/password/permission values for each user ({{Type|Token}}/{{Type|Token}}/{{Type|Num}})<br/>2. Empty key as finalizer
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/core/Users.java Users()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/core/Users.java Users()]
 
|-
 
|-
 
| valign='top' | '''2. Main memory indexes'''
 
| valign='top' | '''2. Main memory indexes'''
| Key/value pairs, suffixed by empty key ({{Type|Token}}/{{Type|Token}}):<br />&bull; <code>TAGS</code> → Tag Index<br />&bull; <code>ATTS</code> → Attribute Index<br />&bull; <code>PATH</code> → Path Index<br />&bull; <code>NS</code> → Namespaces<br />&bull; <code>DOCS</code> → Document Index
+
| 1. Key/value pairs ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; <code>TAGS</code> → Tag Index<br />&nbsp; &bull; <code>ATTS</code> → Attribute Name Index<br />&nbsp; &bull; <code>PATH</code> → Path Index<br />&nbsp; &bull; <code>NS</code> → Namespaces<br />&nbsp; &bull; <code>DOCS</code> → Document Index<br/>2. Empty key as finalizer
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]
 
|-
 
|-
| valign='top' | '''2.1. Name Index'''<br/>Tag/attribute names
+
| valign='top' | '''2 a) Name Index'''<br/>Tag/attribute names
 
| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
 
| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/StatsKey.java StatsKey()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/StatsKey.java StatsKey()]
 
|-
 
|-
| valign='top' | '''2.2. Path Index'''
+
| valign='top' | '''2 b) Path Index'''
 
| 1. Flag for path definition ({{Type|Boolean}}, always <code>true</code>; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2)
 
| 1. Flag for path definition ({{Type|Boolean}}, always <code>true</code>; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2)
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathNode.java PathNode()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathNode.java PathNode()]
 
|-
 
|-
| valign='top' | '''2.3. Namespaces'''
+
| valign='top' | '''2 c) Namespaces'''
 
| 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
 
| 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/NSNode.java NSNode()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/NSNode.java NSNode()]
 
|-
 
|-
| valign='top' | '''2.4. Document Index'''
+
| valign='top' | '''2 d) Document Index'''
 
| Array of integers, representing the distances between all document pre values ({{Type|Nums}})
 
| Array of integers, representing the distances between all document pre values ({{Type|Nums}})
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/DocIndex.java DocIndex()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/DocIndex.java DocIndex()]

Revision as of 19:27, 26 October 2011

The following data types are used for specifying the storage layout:

  • Num: compressed integer (1-5 bytes)
  • Token: length (Num) and bytes of UTF8 byte representation
  • Double: number, stored as token
  • Boolean: boolean (1 byte, 00 or 01)
  • TokenSet: key array (Tokens), next/bucket/size arrays (Nums)
  • Nums, Tokens and Doubles are arrays of values, and introduced with the number of entries (Num)

The following tables present the layout of BaseX database files. All files are suffixed with .basex.

Meta Data, Name/Path/Doc Indexes: inf

Description Format Method
1. Meta Data 1. Key/value pairs (Token/Token):
  • PERM → Number of users (Num), and name/password/permission values for each user (Token/Token/Num)
2. Empty key as finalizer
DiskData()
MetaData()
Users()
2. Main memory indexes 1. Key/value pairs (Token/Token):
  • TAGS → Tag Index
  • ATTS → Attribute Name Index
  • PATH → Path Index
  • NS → Namespaces
  • DOCS → Document Index
2. Empty key as finalizer
DiskData()
2 a) Name Index
Tag/attribute names
1. Token set, storing all names (TokenSet)
2. One StatsKey instance per entry:
2.1. Content kind (Num):
2.1.1. Number: min/max (Doubles)
2.1.2. Category: number of entries (Num), entries (Tokens)
2.2. Number of entries (Num)
2.3. Leaf flag (Boolean)
2.4. Maximum text length (Double; legacy, could be Num)
Names()
TokenSet.read()
StatsKey()
2 b) Path Index 1. Flag for path definition (Boolean, always true; legacy)
2. PathNode:
2.1. Name reference (Num)
2.2. Node kind (Num)
2.3. Number of occurrences (Num)
2.4. Number of children (Num)
2.5. Double; legacy, can be reused or discarded
2.6. Recursive generation of child nodes (→ 2)
PathSummary()
PathNode()
2 c) Namespaces 1. Token set, storing prefixes (TokenSet)
2. Token set, storing URIs (TokenSet)
3. NSNode:
3.1. pre value (Num)
3.2. References to prefix/URI pairs (Nums)
3.3. Number of children (Num)
3.4. Recursive generation of child nodes (→ 3)
Namespaces()
NSNode()
2 d) Document Index Array of integers, representing the distances between all document pre values (Nums) DocIndex()

Node Table: tbl, tbli

  • tbl: Main database table, stored in blocks.
  • tbli: Database directory, organizing the database blocks.

Texts: txt, atv

  • txt: Heap file for text values (document names, string values of texts, comments and processing instructions)
  • atv: Heap file for attribute values.

Value Indexes: txtl, txtr, atvl, atvr

Text Index:

  • txtl: Heap file with ID lists.
  • txtr: Index file with references to ID lists.

The Attribute Index is contained in the files atvl and atvr; it uses the same layout.

Full-Text Fuzzy Index: ftxx, ftxy, ftxz

...will soon be reimplemented.

Full-Text Trie Index: ftxa, ftxb, ftxc

...will soon be dismissed.