Difference between revisions of "Storage Layout"
Jump to navigation
Jump to search
Meta Data, Name/Path/Doc Indexes:
Node Table:
Texts:
Value Indexes:
Full-Text Fuzzy Index:
Full-Text Trie Index:
Line 22: | Line 22: | ||
|- | |- | ||
| {{Type|Boolean}} | | {{Type|Boolean}} | ||
− | | Boolean (1 byte, | + | | Boolean (1 byte, {{Mono|00}} or {{Mono|01}}) |
| {{Mono|true}} → {{Mono|01}} | | {{Mono|true}} → {{Mono|01}} | ||
|- | |- | ||
Line 36: | Line 36: | ||
=Database Files= | =Database Files= | ||
− | The following tables illustrate the layout of the BaseX database files. All files are suffixed with | + | The following tables illustrate the layout of the BaseX database files. All files are suffixed with {{Mono|.basex}}. |
− | ==Meta Data, Name/Path/Doc Indexes: | + | ==Meta Data, Name/Path/Doc Indexes: {{Mono|inf}}== |
{| class="wikitable" width="100%" | {| class="wikitable" width="100%" | ||
Line 47: | Line 47: | ||
|- | |- | ||
| valign='top' | '''1. Meta Data''' | | valign='top' | '''1. Meta Data''' | ||
− | | valign='top' | 1. Key/value pairs ({{Type|Token}}/{{Type|Token}}):<br /> • < | + | | valign='top' | 1. Key/value pairs, in no particular order ({{Type|Token}}/{{Type|Token}}):<br/> • Examples: {{Mono|FNAME}}, {{Mono|TIME}}, {{Mono|SIZE}}, ...<br /> • {{Mono|PERM}} → Number of users ({{Type|Num}}), and name/password/permission values for each user ({{Type|Token}}/{{Type|Token}}/{{Type|Num}})<br/>2. Empty key as finalizer |
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/core/Users.java Users()] | | valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/core/Users.java Users()] | ||
|- | |- | ||
| valign='top' | '''2. Main memory indexes''' | | valign='top' | '''2. Main memory indexes''' | ||
− | | 1. Key/value pairs ({{Type|Token}}/{{Type|Token}}):<br /> • | + | | 1. Key/value pairs, in no particular order ({{Type|Token}}/{{Type|Token}}):<br /> • {{Mono|TAGS}} → Tag Index<br /> • {{Mono|ATTS}} → Attribute Name Index<br /> • {{Mono|PATH}} → Path Index<br /> • {{Mono|NS}} → Namespaces<br /> • {{Mono|DOCS}} → Document Index<br/>2. Empty key as finalizer |
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()] | | valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()] | ||
|- | |- | ||
Line 59: | Line 59: | ||
|- | |- | ||
| valign='top' | '''2 b) Path Index''' | | valign='top' | '''2 b) Path Index''' | ||
− | | 1. Flag for path definition ({{Type|Boolean}}, always | + | | 1. Flag for path definition ({{Type|Boolean}}, always {{Mono|true}}; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2) |
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathNode.java PathNode()] | | valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathNode.java PathNode()] | ||
|- | |- | ||
Line 71: | Line 71: | ||
|} | |} | ||
− | ==Node Table: | + | ==Node Table: {{Mono|tbl}}, {{Mono|tbli}}== |
− | * | + | * {{Mono|tbl}}: Main database table, stored in blocks. |
− | * | + | * {{Mono|tbli}}: Database directory, organizing the database blocks. |
− | ==Texts: | + | ==Texts: {{Mono|txt}}, {{Mono|atv}}== |
− | * | + | * {{Mono|txt}}: Heap file for text values (document names, string values of texts, comments and processing instructions) |
− | * | + | * {{Mono|atv}}: Heap file for attribute values. |
− | ==Value Indexes: | + | ==Value Indexes: {{Mono|txtl}}, {{Mono|txtr}}, {{Mono|atvl}}, {{Mono|atvr}}== |
'''Text Index:''' | '''Text Index:''' | ||
− | * | + | * {{Mono|txtl}}: Heap file with ID lists. |
− | * | + | * {{Mono|txtr}}: Index file with references to ID lists. |
− | The '''Attribute Index''' is contained in the files | + | The '''Attribute Index''' is contained in the files {{Mono|atvl}} and {{Mono|atvr}}; it uses the same layout. |
− | ==Full-Text Fuzzy Index: | + | ==Full-Text Fuzzy Index: {{Mono|ftxx}}, {{Mono|ftxy}}, {{Mono|ftxz}}== |
...will soon be reimplemented. | ...will soon be reimplemented. | ||
− | ==Full-Text Trie Index: | + | ==Full-Text Trie Index: {{Mono|ftxa}}, {{Mono|ftxb}}, {{Mono|ftxc}}== |
...will soon be dismissed. | ...will soon be dismissed. |
Revision as of 19:48, 26 October 2011
Contents
Data Types
The following data types are used for specifying the storage layout:
Type | Description | Example (native → hex integers) |
---|---|---|
Num
|
Compressed integer (1-5 bytes), specified in Num.java | 15 → 0F ; 511 → 41 FF |
Token
|
Length (Num ) and bytes of UTF8 byte representation
|
Hello → 05 48 65 6c 6c 6f
|
Double
|
Number, stored as token | 123 → 03 31 32 33
|
Boolean
|
Boolean (1 byte, 00 or 01 )
|
true → 01
|
Nums , Tokens , Doubles
|
Arrays of values, introduced with the number of entries | 1,2 → 02 01 31 01 32
|
TokenSet
|
Key array (Tokens ), next/bucket/size arrays (3x Nums )
|
Database Files
The following tables illustrate the layout of the BaseX database files. All files are suffixed with .basex
.
Meta Data, Name/Path/Doc Indexes: inf
Description | Format | Method |
---|---|---|
1. Meta Data | 1. Key/value pairs, in no particular order (Token /Token ):• Examples: FNAME , TIME , SIZE , ...• PERM → Number of users (Num ), and name/password/permission values for each user (Token /Token /Num )2. Empty key as finalizer |
DiskData() MetaData() Users() |
2. Main memory indexes | 1. Key/value pairs, in no particular order (Token /Token ):• TAGS → Tag Index• ATTS → Attribute Name Index• PATH → Path Index• NS → Namespaces• DOCS → Document Index2. Empty key as finalizer |
DiskData() |
2 a) Name Index Tag/attribute names |
1. Token set, storing all names (TokenSet )2. One StatsKey instance per entry: 2.1. Content kind ( Num ):2.1.1. Number: min/max ( Doubles )2.1.2. Category: number of entries ( Num ), entries (Tokens )2.2. Number of entries ( Num )2.3. Leaf flag ( Boolean )2.4. Maximum text length ( Double ; legacy, could be Num )
|
Names() TokenSet.read() StatsKey() |
2 b) Path Index | 1. Flag for path definition (Boolean , always true ; legacy)2. PathNode: 2.1. Name reference ( Num )2.2. Node kind ( Num )2.3. Number of occurrences ( Num )2.4. Number of children ( Num )2.5. Double ; legacy, can be reused or discarded2.6. Recursive generation of child nodes (→ 2) |
PathSummary() PathNode() |
2 c) Namespaces | 1. Token set, storing prefixes (TokenSet )2. Token set, storing URIs ( TokenSet )3. NSNode: 3.1. pre value ( Num )3.2. References to prefix/URI pairs ( Nums )3.3. Number of children ( Num )3.4. Recursive generation of child nodes (→ 3) |
Namespaces() NSNode() |
2 d) Document Index | Array of integers, representing the distances between all document pre values (Nums )
|
DocIndex() |
Node Table: tbl
, tbli
tbl
: Main database table, stored in blocks.tbli
: Database directory, organizing the database blocks.
Texts: txt
, atv
txt
: Heap file for text values (document names, string values of texts, comments and processing instructions)atv
: Heap file for attribute values.
Value Indexes: txtl
, txtr
, atvl
, atvr
Text Index:
txtl
: Heap file with ID lists.txtr
: Index file with references to ID lists.
The Attribute Index is contained in the files atvl
and atvr
; it uses the same layout.
Full-Text Fuzzy Index: ftxx
, ftxy
, ftxz
...will soon be reimplemented.
Full-Text Trie Index: ftxa
, ftxb
, ftxc
...will soon be dismissed.