Difference between revisions of "Storage Layout"

From BaseX Documentation
Jump to navigation Jump to search
Line 3: Line 3:
 
==Data Types==
 
==Data Types==
 
* {{Type|Num}}: compressed integer (1-5 bytes)
 
* {{Type|Num}}: compressed integer (1-5 bytes)
* {{Type|Token}}: length (<code>Num</code>) and bytes of UTF8 byte representation
+
* {{Type|Token}}: length ({{Type|Num}}) and bytes of UTF8 byte representation
* {{Type|double}}: number, stored as token
+
* {{Type|Double}}: number, stored as token
* {{Type|boolean}}: boolean (1 byte, <code>00</code> or <code>01</code>)
+
* {{Type|Boolean}}: boolean (1 byte, <code>00</code> or <code>01</code>)
 
* {{Type|TokenSet}}: key array (<code>Tokens</code>), next/bucket/size arrays (<code>Nums</code>)
 
* {{Type|TokenSet}}: key array (<code>Tokens</code>), next/bucket/size arrays (<code>Nums</code>)
 +
* {{Type|Nums}}, {{Type|Tokens}} and {{Type|Doubles}} are arrays of values, and introduced with the number of entries ({{Type|Num}})
  
 
==inf.basex==
 
==inf.basex==
Line 27: Line 28:
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]
 
|-
 
|-
| valign='top' | '''2.1. Name Index'''<br/>Element/attribute names
+
| valign='top' | '''2.1. Name Index'''<br/>Tag/attribute names
| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}})<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
+
| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/StatsKey.java StatsKey()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/StatsKey.java StatsKey()]
 
|-
 
|-
Line 38: Line 39:
 
| 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
 
| 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/NSNode.java NSNode()]
 
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/NSNode.java NSNode()]
 +
|-
 +
| valign='top' | '''2.4. Document Index'''
 +
| Array of integers, representing the distances between all document pre values ({{Type|Nums}})
 +
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/DocIndex.java DocIndex()]
 
|}
 
|}

Revision as of 18:51, 26 October 2011

Version: 7.0

Data Types

  • Num: compressed integer (1-5 bytes)
  • Token: length (Num) and bytes of UTF8 byte representation
  • Double: number, stored as token
  • Boolean: boolean (1 byte, 00 or 01)
  • TokenSet: key array (Tokens), next/bucket/size arrays (Nums)
  • Nums, Tokens and Doubles are arrays of values, and introduced with the number of entries (Num)

inf.basex

Description Format Method
Disk Data Database meta information DiskData()
1. Meta Data Key/value pairs, suffixed by empty key (Token/Token):
PERM → User Permissions
MetaData()
Users()
2. Main memory indexes Key/value pairs, suffixed by empty key (Token/Token):
TAGS → Tag Index
ATTS → Attribute Index
PATH → Path Index
NS → Namespaces
DOCS → Document Index
DiskData()
2.1. Name Index
Tag/attribute names
1. Token set, storing all names (TokenSet)
2. One StatsKey instance per entry:
2.1. Content kind (Num):
2.1.1. Number: min/max (Doubles)
2.1.2. Category: number of entries (Num), entries (Tokens)
2.2. Number of entries (Num)
2.3. Leaf flag (Boolean)
2.4. Maximum text length (Double; legacy, could be Num)
Names()
TokenSet.read()
StatsKey()
2.2. Path Index 1. Flag for path definition (Boolean, always true; legacy)
2. PathNode:
2.1. Name reference (Num)
2.2. Node kind (Num)
2.3. Number of occurrences (Num)
2.4. Number of children (Num)
2.5. Double; legacy, can be reused or discarded
2.6. Recursive generation of child nodes (→ 2)
PathSummary()
PathNode()
2.3. Namespaces 1. Token set, storing prefixes (TokenSet)
2. Token set, storing URIs (TokenSet)
3. NSNode:
3.1. pre value (Num)
3.2. References to prefix/URI pairs (Nums)
3.3. Number of children (Num)
3.4. Recursive generation of child nodes (→ 3)
Namespaces()
NSNode()
2.4. Document Index Array of integers, representing the distances between all document pre values (Nums) DocIndex()