Difference between revisions of "Storage Layout"

Revision as of 19:15, 26 October 2011

The following data types are used for specifying the storage layout:

Num: compressed integer (1-5 bytes)
Token: length (Num) and bytes of UTF8 byte representation
Double: number, stored as token
Boolean: boolean (1 byte, 00 or 01)
TokenSet: key array (Tokens), next/bucket/size arrays (Nums)
Nums, Tokens and Doubles are arrays of values, and introduced with the number of entries (Num)

The following tables present the layout of BaseX database files. All files are suffixed with .basex.

Meta Data, Name/Path/Doc Indexes: `inf`

Description	Format	Method
1. Meta Data	Key/value pairs, suffixed by empty key (`Token`/`Token`): • `PERM` → User Permissions	DiskData() MetaData() Users()
2. Main memory indexes	Key/value pairs, suffixed by empty key (`Token`/`Token`): • `TAGS` → Tag Index • `ATTS` → Attribute Index • `PATH` → Path Index • `NS` → Namespaces • `DOCS` → Document Index	DiskData()
2.1. Name Index Tag/attribute names	1. Token set, storing all names (`TokenSet`) 2. One StatsKey instance per entry: 2.1. Content kind (`Num`): 2.1.1. Number: min/max (`Doubles`) 2.1.2. Category: number of entries (`Num`), entries (`Tokens`) 2.2. Number of entries (`Num`) 2.3. Leaf flag (`Boolean`) 2.4. Maximum text length (`Double`; legacy, could be `Num`)	Names() TokenSet.read() StatsKey()
2.2. Path Index	1. Flag for path definition (`Boolean`, always `true`; legacy) 2. PathNode: 2.1. Name reference (`Num`) 2.2. Node kind (`Num`) 2.3. Number of occurrences (`Num`) 2.4. Number of children (`Num`) 2.5. `Double`; legacy, can be reused or discarded 2.6. Recursive generation of child nodes (→ 2)	PathSummary() PathNode()
2.3. Namespaces	1. Token set, storing prefixes (`TokenSet`) 2. Token set, storing URIs (`TokenSet`) 3. NSNode: 3.1. pre value (`Num`) 3.2. References to prefix/URI pairs (`Nums`) 3.3. Number of children (`Num`) 3.4. Recursive generation of child nodes (→ 3)	Namespaces() NSNode()
2.4. Document Index	Array of integers, representing the distances between all document pre values (`Nums`)	DocIndex()

Node Table: `tbl`, `tbli`

tbl: Main database table, stored in blocks.
tbli: Database directory, organizing the database blocks.

Texts: `txt`, `atv`

txt: Heap file with text values (document names, string values of texts, comments and processing instructions)
atv: attribute values.

Value Indexes: `txtl`, `txtr`, `atvl`, `atvr`

txtl: Text index: Heap file with ID lists.
txtr: Text index: Index file with references to ID lists.
atvl/atvr: Attribute value index, using same logic.

Full-Text Fuzzy Index: `ftxx`, `ftxy`, `ftxz`

...will soon be reimplemented.

Full-Text Trie Index: `ftxa`, `ftxb`, `ftxc`

...will soon be dismissed.

@@ Line 1: / Line 1: @@
-==Data Types==
+The following data types are used for specifying the storage layout:
 * {{Type|Num}}: compressed integer (1-5 bytes)
 * {{Type|Token}}: length ({{Type|Num}}) and bytes of UTF8 byte representation
@@ Line 7: / Line 7: @@
 * {{Type|Nums}}, {{Type|Tokens}} and {{Type|Doubles}} are arrays of values, and introduced with the number of entries ({{Type|Num}})
-==inf.basex==
+The following tables present the layout of BaseX database files. All files are suffixed with <code>.basex</code>.
-'''Contents:''' Meta information on a database and main memory indexes.
+==Meta Data, Name/Path/Doc Indexes: <code>inf</code>==
 {| class="wikitable" width="100%"
@@ Line 42: / Line 42: @@
 |}
-==(tbl|tbli).basex==
+==Node Table: <code>tbl</code>, <code>tbli</code>==
-'''Contents:''' Main database table and directory.
+* <code>tbl</code>: Main database table, stored in blocks.
+* <code>tbli</code>: Database directory, organizing the database blocks.
-==txt.basex==
+==Texts: <code>txt</code>, <code>atv</code>==
-'''Contents:''' Heap file with text values (document names, string values of texts, comments and processing instructions).
+* <code>txt</code>: Heap file with text values (document names, string values of texts, comments and processing instructions)
+* <code>atv</code>: attribute values.
-==atv.basex==
+==Value Indexes: <code>txtl</code>, <code>txtr</code>, <code>atvl</code>, <code>atvr</code>==
-'''Contents:''' Heap file with attribute values.
+* <code>txtl</code>: Text index: Heap file with ID lists.
+* <code>txtr</code>: Text index: Index file with references to ID lists.
+* <code>atvl</code>/<code>atvr</code>: Attribute value index, using same logic.
-==(txtl|txtr).basex==
+==Full-Text Fuzzy Index: <code>ftxx</code>, <code>ftxy</code>, <code>ftxz</code>==
-'''Contents:''' Value index for texts.
+...will soon be reimplemented.
-==(atvl|atvr).basex==
+==Full-Text Trie Index: <code>ftxa</code>, <code>ftxb</code>, <code>ftxc</code>==
-'''Contents:''' Value index for attributes.
+...will soon be dismissed.
-==(ftxa|ftxb|ftxc).basex==
-'''Contents:''' Trie full-text index.
-==(ftxx|ftxy|ftxz).basex==
-'''Contents:''' Fuzzy full-text index.

Difference between revisions of "Storage Layout"

Revision as of 19:15, 26 October 2011

Contents

Meta Data, Name/Path/Doc Indexes: `inf`

Node Table: `tbl`, `tbli`

Texts: `txt`, `atv`

Value Indexes: `txtl`, `txtr`, `atvl`, `atvr`

Full-Text Fuzzy Index: `ftxx`, `ftxy`, `ftxz`

Full-Text Trie Index: `ftxa`, `ftxb`, `ftxc`

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Difference between revisions of "Storage Layout"

Revision as of 19:15, 26 October 2011

Contents

Meta Data, Name/Path/Doc Indexes: inf

Node Table: tbl, tbli

Texts: txt, atv

Value Indexes: txtl, txtr, atvl, atvr

Full-Text Fuzzy Index: ftxx, ftxy, ftxz

Full-Text Trie Index: ftxa, ftxb, ftxc

Navigation menu

Search

Meta Data, Name/Path/Doc Indexes: `inf`

Node Table: `tbl`, `tbli`

Texts: `txt`, `atv`

Value Indexes: `txtl`, `txtr`, `atvl`, `atvr`

Full-Text Fuzzy Index: `ftxx`, `ftxy`, `ftxz`

Full-Text Trie Index: `ftxa`, `ftxb`, `ftxc`