Difference between revisions of "Storage Layout"
Jump to navigation
Jump to search
Meta Data, Name/Path/Doc Indexes:
Node Table:
Texts:
Value Indexes:
Full-Text Fuzzy Index:
Full-Text Trie Index:
Line 1: | Line 1: | ||
− | + | The following data types are used for specifying the storage layout: | |
* {{Type|Num}}: compressed integer (1-5 bytes) | * {{Type|Num}}: compressed integer (1-5 bytes) | ||
* {{Type|Token}}: length ({{Type|Num}}) and bytes of UTF8 byte representation | * {{Type|Token}}: length ({{Type|Num}}) and bytes of UTF8 byte representation | ||
Line 7: | Line 7: | ||
* {{Type|Nums}}, {{Type|Tokens}} and {{Type|Doubles}} are arrays of values, and introduced with the number of entries ({{Type|Num}}) | * {{Type|Nums}}, {{Type|Tokens}} and {{Type|Doubles}} are arrays of values, and introduced with the number of entries ({{Type|Num}}) | ||
− | + | The following tables present the layout of BaseX database files. All files are suffixed with <code>.basex</code>. | |
− | + | ==Meta Data, Name/Path/Doc Indexes: <code>inf</code>== | |
{| class="wikitable" width="100%" | {| class="wikitable" width="100%" | ||
Line 42: | Line 42: | ||
|} | |} | ||
− | == | + | ==Node Table: <code>tbl</code>, <code>tbli</code>== |
− | + | * <code>tbl</code>: Main database table, stored in blocks. | |
+ | * <code>tbli</code>: Database directory, organizing the database blocks. | ||
− | ==txt | + | ==Texts: <code>txt</code>, <code>atv</code>== |
− | + | * <code>txt</code>: Heap file with text values (document names, string values of texts, comments and processing instructions) | |
+ | * <code>atv</code>: attribute values. | ||
− | == | + | ==Value Indexes: <code>txtl</code>, <code>txtr</code>, <code>atvl</code>, <code>atvr</code>== |
− | + | * <code>txtl</code>: Text index: Heap file with ID lists. | |
+ | * <code>txtr</code>: Text index: Index file with references to ID lists. | ||
+ | * <code>atvl</code>/<code>atvr</code>: Attribute value index, using same logic. | ||
− | == | + | ==Full-Text Fuzzy Index: <code>ftxx</code>, <code>ftxy</code>, <code>ftxz</code>== |
− | + | ...will soon be reimplemented. | |
− | == | + | ==Full-Text Trie Index: <code>ftxa</code>, <code>ftxb</code>, <code>ftxc</code>== |
− | + | ...will soon be dismissed. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 19:15, 26 October 2011
The following data types are used for specifying the storage layout:
Num
: compressed integer (1-5 bytes)Token
: length (Num
) and bytes of UTF8 byte representationDouble
: number, stored as tokenBoolean
: boolean (1 byte,00
or01
)TokenSet
: key array (Tokens
), next/bucket/size arrays (Nums
)Nums
,Tokens
andDoubles
are arrays of values, and introduced with the number of entries (Num
)
The following tables present the layout of BaseX database files. All files are suffixed with .basex
.
Contents
Meta Data, Name/Path/Doc Indexes: inf
Description | Format | Method |
---|---|---|
1. Meta Data | Key/value pairs, suffixed by empty key (Token /Token ):• PERM → User Permissions
|
DiskData() MetaData() Users() |
2. Main memory indexes | Key/value pairs, suffixed by empty key (Token /Token ):• TAGS → Tag Index• ATTS → Attribute Index• PATH → Path Index• NS → Namespaces• DOCS → Document Index
|
DiskData() |
2.1. Name Index Tag/attribute names |
1. Token set, storing all names (TokenSet )2. One StatsKey instance per entry: 2.1. Content kind ( Num ):2.1.1. Number: min/max ( Doubles )2.1.2. Category: number of entries ( Num ), entries (Tokens )2.2. Number of entries ( Num )2.3. Leaf flag ( Boolean )2.4. Maximum text length ( Double ; legacy, could be Num )
|
Names() TokenSet.read() StatsKey() |
2.2. Path Index | 1. Flag for path definition (Boolean , always true ; legacy)2. PathNode: 2.1. Name reference ( Num )2.2. Node kind ( Num )2.3. Number of occurrences ( Num )2.4. Number of children ( Num )2.5. Double ; legacy, can be reused or discarded2.6. Recursive generation of child nodes (→ 2) |
PathSummary() PathNode() |
2.3. Namespaces | 1. Token set, storing prefixes (TokenSet )2. Token set, storing URIs ( TokenSet )3. NSNode: 3.1. pre value ( Num )3.2. References to prefix/URI pairs ( Nums )3.3. Number of children ( Num )3.4. Recursive generation of child nodes (→ 3) |
Namespaces() NSNode() |
2.4. Document Index | Array of integers, representing the distances between all document pre values (Nums )
|
DocIndex() |
Node Table: tbl
, tbli
tbl
: Main database table, stored in blocks.tbli
: Database directory, organizing the database blocks.
Texts: txt
, atv
txt
: Heap file with text values (document names, string values of texts, comments and processing instructions)atv
: attribute values.
Value Indexes: txtl
, txtr
, atvl
, atvr
txtl
: Text index: Heap file with ID lists.txtr
: Text index: Index file with references to ID lists.atvl
/atvr
: Attribute value index, using same logic.
Full-Text Fuzzy Index: ftxx
, ftxy
, ftxz
...will soon be reimplemented.
Full-Text Trie Index: ftxa
, ftxb
, ftxc
...will soon be dismissed.