Latest revision as of 10:17, 9 March 2023

This article is part of the Advanced User's Guide. It presents some low-level details on how data is stored in the database files.

Data Types[edit]

The following data types are used for specifying the storage layout:

Type	Description	Example (native → hex integers)
`Num`	Compressed integer (1-5 bytes), specified in Num.java	`15` → `0F`; `511` → `41 FF`
`Token`	Length (`Num`) and bytes of UTF8 byte representation	`Hello` → `05 48 65 6c 6c 6f`
`Double`	Number, stored as token	`123` → `03 31 32 33`
`Boolean`	Boolean (1 byte, `00` or `01`)	`true` → `01`
`Nums`, `Tokens`, `Doubles`	Arrays of values, introduced with the number of entries	`1,2` → `02 01 31 01 32`
`TokenSet`	Key array (`Tokens`), next/bucket/size arrays (3x `Nums`)

Database Files[edit]

The following tables illustrate the layout of the BaseX database files. All files are suffixed with .basex.

Metadata, Name/Path/Doc Indexes: `inf`[edit]

Description	Format
1. Metadata	1. Key/value pairs, in no particular order (`Token`/`Token`): • Examples: `FNAME`, `TIME`, `SIZE`, ... • `PERM` → Number of users (`Num`), and name/password/permission values for each user (`Token`/`Token`/`Num`) 2. Empty key as finalizer
2. Main memory indexes	1. Key/value pairs, in no particular order (`Token`/`Token`): • `TAGS` → Element Name Index • `ATTS` → Attribute Name Index • `PATH` → Path Index • `NS` → Namespaces • `DOCS` → Document Index 2. Empty key as finalizer
2 a) Name Index Element/attribute names	1. Token set, storing all names (`TokenSet`) 2. One StatsKey instance per entry: 2.1. Content kind (`Num`): 2.1.1. Number: min/max (`Doubles`) 2.1.2. Category: number of entries (`Num`), entries (`Tokens`) 2.2. Number of entries (`Num`) 2.3. Leaf flag (`Boolean`) 2.4. Maximum text length (`Double`; legacy, could be `Num`)
2 b) Path Index	1. Flag for path definition (`Boolean`, always `true`; legacy) 2. PathNode: 2.1. Name reference (`Num`) 2.2. Node kind (`Num`) 2.3. Number of occurrences (`Num`) 2.4. Number of children (`Num`) 2.5. `Double`; legacy, can be reused or discarded 2.6. Recursive generation of child nodes (→ 2)
2 c) Namespaces	1. Token set, storing prefixes (`TokenSet`) 2. Token set, storing URIs (`TokenSet`) 3. NSNode: 3.1. pre value (`Num`) 3.2. References to prefix/URI pairs (`Nums`) 3.3. Number of children (`Num`) 3.4. Recursive generation of child nodes (→ 3)
2 d) Document Index	Array of integers, representing the distances between all document pre values (`Nums`)

Node Table: `tbl`, `tbli`[edit]

tbl: Main database table, stored in blocks.
tbli: Database directory, organizing the database blocks.

Some more information on the node storage is available.

Texts: `txt`, `atv`[edit]

txt: Heap file for text values (document names, string values of texts, comments and processing instructions)
atv: Heap file for attribute values.

Value Indexes: `txtl`, `txtr`, `atvl`, `atvr`[edit]

Text Index:

txtl: Heap file with ID lists.
txtr: Index file with references to ID lists.

The Attribute Index is contained in the files atvl and atvr, the Token Index in tokl and tokr. All have the same layout.

For a more detailed discussion and examples of these file formats please see Index File Structure.

Document Path Index: `pth`[edit]

Provides an index of all the document paths in the database. For databases with a large number of paths this file can be quite large so it is only generated the first time a function requesting a path lookup is run. For databases where path lookups are never used this file will not exist.

Note: On Windows/Mac systems this file is case insensitive (all paths are lower case). On UNIX-like systems this file is case sensitive. The behaviour of path look ups will vary between systems. Copying this file between system types may lead to unexpected behaviour.

ID/Pre Mapping: `idp`[edit]

This file is only created if incremental indexing (UPDINDEX) is enabled for a database. It is used to provide a quick look up of the pre value for a database node id.

Full-Text Fuzzy Index: `ftxx`, `ftxy`, `ftxz`[edit]

...may soon be reimplemented.

@@ Line 1: / Line 1: @@
-==Data Types==
+This article is part of the [[Advanced User's Guide]]. It presents some low-level details on how data is stored in the database files.
-* {{Type|Num}}: compressed integer (1-5 bytes)
-* {{Type|Token}}: length ({{Type|Num}}) and bytes of UTF8 byte representation
-* {{Type|Double}}: number, stored as token
-* {{Type|Boolean}}: boolean (1 byte, <code>00</code> or <code>01</code>)
-* {{Type|TokenSet}}: key array (<code>Tokens</code>), next/bucket/size arrays (<code>Nums</code>)
-* {{Type|Nums}}, {{Type|Tokens}} and {{Type|Doubles}} are arrays of values, and introduced with the number of entries ({{Type|Num}})
-==inf.basex==
+=Data Types=
-'''Contents:''' Meta information on a database and main memory indexes.
+The following data types are used for specifying the storage layout:
-{| class="wikitable" width="100%"
+{| class="wikitable"
-|-
+|- valign="top"
+! Type
+! Description
+! Example (native → hex integers)
+|- valign="top"
+| {{Type|Num}}
+| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/Num.java Num.java]
+| {{Code|15}} → {{Code|0F}}; {{Code|511}} → {{Code|41 FF}}<br/>
+|- valign="top"
+| {{Type|Token}}
+| Length ({{Type|Num}}) and bytes of UTF8 byte representation
+| {{Code|Hello}} → {{Code|05 48 65 6c 6c 6f}}
+|- valign="top"
+| {{Type|Double}}
+| Number, stored as token
+| {{Code|123}} → {{Code|03 31 32 33}}
+|- valign="top"
+| {{Type|Boolean}}
+| Boolean (1 byte, {{Code|00}} or {{Code|01}})
+| {{Code|true}} → {{Code|01}}
+|- valign="top"
+| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}
+| Arrays of values, introduced with the number of entries
+| {{Code|1,2}} → {{Code|02 01 31 01 32}}
+|- valign="top"
+| {{Type|TokenSet}}
+| Key array ({{Type|Tokens}}), next/bucket/size arrays (3x {{Type|Nums}})
+|
+|}
+=Database Files=
+The following tables illustrate the layout of the BaseX database files. All files are suffixed with {{Code|.basex}}.
+==Metadata, Name/Path/Doc Indexes: {{Code|inf}}==
+{| class="wikitable"
+|- valign="top"
 ! Description
 ! Format
-! Method
+|- valign="top"
-|-
+| '''1. Metadata'''
-| valign='top' | '''1. Meta Data'''
+| 1. Key/value pairs, in no particular order ({{Type|Token}}/{{Type|Token}}):<br/>&nbsp; &bull; Examples: {{Code|FNAME}}, {{Code|TIME}}, {{Code|SIZE}}, ...<br/>&nbsp; &bull; {{Code|PERM}} → Number of users ({{Type|Num}}), and name/password/permission values for each user ({{Type|Token}}/{{Type|Token}}/{{Type|Num}})<br/>2. Empty key as finalizer
-| valign='top' | Key/value pairs, suffixed by empty key ({{Type|Token}}/{{Type|Token}}):<br />&bull; <code>PERM</code> → User Permissions
+|- valign="top"
-| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/core/Users.java Users()]
+| '''2. Main memory indexes'''
-|-
+| 1. Key/value pairs, in no particular order ({{Type|Token}}/{{Type|Token}}):<br/>&nbsp; &bull; {{Code|TAGS}} → Element Name Index<br/>&nbsp; &bull; {{Code|ATTS}} → Attribute Name Index<br/>&nbsp; &bull; {{Code|PATH}} → Path Index<br/>&nbsp; &bull; {{Code|NS}} → Namespaces<br/>&nbsp; &bull; {{Code|DOCS}} → Document Index<br/>2. Empty key as finalizer
-| valign='top' | '''2. Main memory indexes'''
+|- valign="top"
-| Key/value pairs, suffixed by empty key ({{Type|Token}}/{{Type|Token}}):<br />&bull; <code>TAGS</code> → Tag Index<br />&bull; <code>ATTS</code> → Attribute Index<br />&bull; <code>PATH</code> → Path Index<br />&bull; <code>NS</code> → Namespaces<br />&bull; <code>DOCS</code> → Document Index
+| '''2 a) Name Index'''<br/>Element/attribute names
-| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/DiskData.java DiskData()]
+| 1. Token set, storing all names ({{Type|TokenSet}})<br/>2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br/>2.1.1. Number: min/max ({{Type|Doubles}})<br/>2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br/>2.2. Number of entries ({{Type|Num}})<br/>2.3. Leaf flag ({{Type|Boolean}})<br/>2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
-|-
+|- valign="top"
-| valign='top' | '''2.1. Name Index'''<br/>Tag/attribute names
+| '''2 b) Path Index'''
-| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
+| 1. Flag for path definition ({{Type|Boolean}}, always {{Code|true}}; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2)
-| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/StatsKey.java StatsKey()]
+|- valign="top"
-|-
+| '''2 c) Namespaces'''
-| valign='top' | '''2.2. Path Index'''
-| 1. Flag for path definition ({{Type|Boolean}}, always <code>true</code>; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2)
-| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/path/PathNode.java PathNode()]
-|-
-| valign='top' | '''2.3. Namespaces'''
 | 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
-| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/NSNode.java NSNode()]
+|- valign="top"
-|-
+| '''2 d) Document Index'''
-| valign='top' | '''2.4. Document Index'''
 | Array of integers, representing the distances between all document pre values ({{Type|Nums}})
-| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/index/DocIndex.java DocIndex()]
 |}
-==(tbl|tbli).basex==
+==Node Table: {{Code|tbl}}, {{Code|tbli}}==
+* {{Code|tbl}}: Main database table, stored in blocks.
+* {{Code|tbli}}: Database directory, organizing the database blocks.
-'''Contents:''' Main database table and directory.
+Some more information on the [[Node Storage|node storage]] is available.
-==txt.basex==
+==Texts: {{Code|txt}}, {{Code|atv}}==
-'''Contents:''' Heap file with text values (document names, string values of texts, comments and processing instructions).
+* {{Code|txt}}: Heap file for text values (document names, string values of texts, comments and processing instructions)
+* {{Code|atv}}: Heap file for attribute values.
-==atv.basex==
+==Value Indexes: {{Code|txtl}}, {{Code|txtr}}, {{Code|atvl}}, {{Code|atvr}}==
-'''Contents:''' Heap file with attribute values.
+'''Text Index:'''
+* {{Code|txtl}}: Heap file with ID lists.
+* {{Code|txtr}}: Index file with references to ID lists.
+The '''Attribute Index''' is contained in the files {{Code|atvl}} and {{Code|atvr}}, the '''Token Index''' in {{Code|tokl}} and {{Code|tokr}}. All have the same layout.
-==(txtl|txtr).basex==
+For a more detailed discussion and examples of these file formats please see [[Index File Structure]].
-'''Contents:''' Value index for texts.
+==Document Path Index: {{Code|pth}}==
-==(atvl|atvr).basex==
+Provides an index of all the document paths in the database. For databases with a large number of paths this file can be quite large so it is only generated the first time a function requesting a path lookup is run. For databases where path lookups are never used this file will not exist.
-'''Contents:''' Value index for attributes.
+'''Note:''' On Windows/Mac systems this file is case insensitive (all paths are lower case). On UNIX-like systems this file is case sensitive. The behaviour of path look ups will vary between systems. Copying this file between system types may lead to unexpected behaviour.
-==(ftxa|ftxb|ftxc).basex==
+==ID/Pre Mapping: {{Code|idp}}==
-'''Contents:''' Trie full-text index.
+This file is only created if incremental indexing (UPDINDEX) is enabled for a database. It is used to provide a quick look up of the pre value for a database node id.
-==(ftxx|ftxy|ftxz).basex==
+==Full-Text Fuzzy Index: {{Code|ftxx}}, {{Code|ftxy}}, {{Code|ftxz}}==
-'''Contents:''' Fuzzy full-text index.
+...may soon be reimplemented.

Difference between revisions of "Storage Layout"

Latest revision as of 10:17, 9 March 2023

Contents

Data Types[edit]

Database Files[edit]

Metadata, Name/Path/Doc Indexes: `inf`[edit]

Node Table: `tbl`, `tbli`[edit]

Texts: `txt`, `atv`[edit]

Value Indexes: `txtl`, `txtr`, `atvl`, `atvr`[edit]

Document Path Index: `pth`[edit]

ID/Pre Mapping: `idp`[edit]

Full-Text Fuzzy Index: `ftxx`, `ftxy`, `ftxz`[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Difference between revisions of "Storage Layout"

Latest revision as of 10:17, 9 March 2023

Contents

Data Types[edit]

Database Files[edit]

Metadata, Name/Path/Doc Indexes: inf[edit]

Node Table: tbl, tbli[edit]

Texts: txt, atv[edit]

Value Indexes: txtl, txtr, atvl, atvr[edit]

Document Path Index: pth[edit]

ID/Pre Mapping: idp[edit]

Full-Text Fuzzy Index: ftxx, ftxy, ftxz[edit]

Navigation menu

Search

Metadata, Name/Path/Doc Indexes: `inf`[edit]

Node Table: `tbl`, `tbli`[edit]

Texts: `txt`, `atv`[edit]

Value Indexes: `txtl`, `txtr`, `atvl`, `atvr`[edit]

Document Path Index: `pth`[edit]

ID/Pre Mapping: `idp`[edit]

Full-Text Fuzzy Index: `ftxx`, `ftxy`, `ftxz`[edit]