Difference between revisions of "Storage Layout"
Jump to navigation
Jump to search
Meta Data, Name/Path/Doc Indexes:
Node Table:
Texts:
Value Indexes:
Full-Text Fuzzy Index:
Full-Text Trie Index:
Line 49: | Line 49: | ||
==Texts: <code>txt</code>, <code>atv</code>== | ==Texts: <code>txt</code>, <code>atv</code>== | ||
− | * <code>txt</code>: Heap file | + | * <code>txt</code>: Heap file for text values (document names, string values of texts, comments and processing instructions) |
− | * <code>atv</code>: attribute values. | + | * <code>atv</code>: Heap file for attribute values. |
==Value Indexes: <code>txtl</code>, <code>txtr</code>, <code>atvl</code>, <code>atvr</code>== | ==Value Indexes: <code>txtl</code>, <code>txtr</code>, <code>atvl</code>, <code>atvr</code>== | ||
− | * <code>txtl</code> | + | Text Index: |
− | * <code>txtr</code> | + | * <code>txtl</code>: Heap file with ID lists. |
− | * <code>atvl</code> | + | * <code>txtr</code>: Index file with references to ID lists. |
+ | Attribute Index: | ||
+ | * <code>atvl</code>: Heap file with ID lists. | ||
+ | * <code>atvr</code>: Index file with references to ID lists. | ||
==Full-Text Fuzzy Index: <code>ftxx</code>, <code>ftxy</code>, <code>ftxz</code>== | ==Full-Text Fuzzy Index: <code>ftxx</code>, <code>ftxy</code>, <code>ftxz</code>== |
Revision as of 18:17, 26 October 2011
The following data types are used for specifying the storage layout:
Num
: compressed integer (1-5 bytes)Token
: length (Num
) and bytes of UTF8 byte representationDouble
: number, stored as tokenBoolean
: boolean (1 byte,00
or01
)TokenSet
: key array (Tokens
), next/bucket/size arrays (Nums
)Nums
,Tokens
andDoubles
are arrays of values, and introduced with the number of entries (Num
)
The following tables present the layout of BaseX database files. All files are suffixed with .basex
.
Contents
Meta Data, Name/Path/Doc Indexes: inf
Description | Format | Method |
---|---|---|
1. Meta Data | Key/value pairs, suffixed by empty key (Token /Token ):• PERM → User Permissions
|
DiskData() MetaData() Users() |
2. Main memory indexes | Key/value pairs, suffixed by empty key (Token /Token ):• TAGS → Tag Index• ATTS → Attribute Index• PATH → Path Index• NS → Namespaces• DOCS → Document Index
|
DiskData() |
2.1. Name Index Tag/attribute names |
1. Token set, storing all names (TokenSet )2. One StatsKey instance per entry: 2.1. Content kind ( Num ):2.1.1. Number: min/max ( Doubles )2.1.2. Category: number of entries ( Num ), entries (Tokens )2.2. Number of entries ( Num )2.3. Leaf flag ( Boolean )2.4. Maximum text length ( Double ; legacy, could be Num )
|
Names() TokenSet.read() StatsKey() |
2.2. Path Index | 1. Flag for path definition (Boolean , always true ; legacy)2. PathNode: 2.1. Name reference ( Num )2.2. Node kind ( Num )2.3. Number of occurrences ( Num )2.4. Number of children ( Num )2.5. Double ; legacy, can be reused or discarded2.6. Recursive generation of child nodes (→ 2) |
PathSummary() PathNode() |
2.3. Namespaces | 1. Token set, storing prefixes (TokenSet )2. Token set, storing URIs ( TokenSet )3. NSNode: 3.1. pre value ( Num )3.2. References to prefix/URI pairs ( Nums )3.3. Number of children ( Num )3.4. Recursive generation of child nodes (→ 3) |
Namespaces() NSNode() |
2.4. Document Index | Array of integers, representing the distances between all document pre values (Nums )
|
DocIndex() |
Node Table: tbl
, tbli
tbl
: Main database table, stored in blocks.tbli
: Database directory, organizing the database blocks.
Texts: txt
, atv
txt
: Heap file for text values (document names, string values of texts, comments and processing instructions)atv
: Heap file for attribute values.
Value Indexes: txtl
, txtr
, atvl
, atvr
Text Index:
txtl
: Heap file with ID lists.txtr
: Index file with references to ID lists.
Attribute Index:
atvl
: Heap file with ID lists.atvr
: Index file with references to ID lists.
Full-Text Fuzzy Index: ftxx
, ftxy
, ftxz
...will soon be reimplemented.
Full-Text Trie Index: ftxa
, ftxb
, ftxc
...will soon be dismissed.