Changes

Jump to navigation Jump to search
1,025 bytes added ,  00:52, 13 December 2020
Added information regarding pth.basex and idp.basex
This article is part of the [[Advanced User's Guide]]. It presents some low-level details on how data is stored in the database files.
==Data Types==
The following data types are used for specifying the storage layout:
|-
| {{Type|Num}}
| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/Num.java Num.java]| {{MonoCode|15}} → {{MonoCode|0F}}; {{MonoCode|511}} → {{MonoCode|41 FF}}<br/>
|-
| {{Type|Token}}
| Length ({{Type|Num}}) and bytes of UTF8 byte representation
| {{MonoCode|Hello}} → {{MonoCode|05 48 65 6c 6c 6f}}
|-
| {{Type|Double}}
| Number, stored as token
| {{MonoCode|123}} → {{MonoCode|03 31 32 33}}
|-
| {{Type|Boolean}}
| Boolean (1 byte, <code>{{Code|00</code> }} or <code>{{Code|01</code>}})| {{MonoCode|true}} → {{MonoCode|01}}
|-
| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}
| Arrays of values, introduced with the number of entries
| {{MonoCode|1,2}} → {{MonoCode|02 01 31 01 32}}
|-
| {{Type|TokenSet}}
|}
==Database Files==
The following tables present illustrate the layout of the BaseX database files. All files are suffixed with <code>{{Code|.basex</code>}}.
===Meta Data, Name/Path/Doc Indexes: <code>{{Code|inf</code>=}}==
{| class="wikitable" width="100%"
! Method
|-
| valign='top' | '''1. Meta Data'''| valign='top' | 1. Key/value pairs , in no particular order ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; Examples: {{Code|FNAME}}, {{Code|TIME}}, {{Code|SIZE}}, ...<codebr />&nbsp; &bull; {{Code|PERM</code> }} → Number of users ({{Type|Num}}), and name/password/permission values for each user ({{Type|Token}}/{{Type|Token}}/{{Type|Num}})<br/>2. Empty key as finalizer| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/core/Users.java Users()]
|-
| valign='top' | '''2. Main memory indexes'''| 1. Key/value pairs , in no particular order ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; <code>{{Code|TAGS</code> }} Tag Element Name Index<br />&nbsp; &bull; <code>{{Code|ATTS</code> }} → Attribute Name Index<br />&nbsp; &bull; <code>{{Code|PATH</code> }} → Path Index<br />&nbsp; &bull; <code>{{Code|NS</code> }} → Namespaces<br />&nbsp; &bull; <code>{{Code|DOCS</code> }} → Document Index<br/>2. Empty key as finalizer| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/DiskData.java DiskData()]
|-
| valign='top' | '''2 a) Name Index'''<br/>TagElement/attribute names
| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/StatsKey.java StatsKey()]
|-
| valign='top' | '''2 b) Path Index'''| 1. Flag for path definition ({{Type|Boolean}}, always <code>{{Code|true</code>}}; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2)| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/path/PathNode.java PathNode()]
|-
| valign='top' | '''2 c) Namespaces'''
| 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/NSNode.java NSNode()]
|-
| valign='top' | '''2 d) Document Index'''
| Array of integers, representing the distances between all document pre values ({{Type|Nums}})
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/DocIndex.java DocIndex()]
|}
===Node Table: <code>{{Code|tbl</code>}}, <code>{{Code|tbli</code>=}}==
* <code>{{Code|tbl</code>}}: Main database table, stored in blocks.* <code>{{Code|tbli</code>}}: Database directory, organizing the database blocks.
===Texts: <code>txt</code>, <code>atv</code>===Some more information on the [[Node Storage|node storage]] is available.
* <code>==Texts: {{Code|txt</code>: Heap file for text values (document names, string values of texts}}, comments and processing instructions)* <code>{{Code|atv</code>: Heap file for attribute values.}}==
=* {{Code|txt}}: Heap file for text values (document names, string values of texts, comments and processing instructions)* {{Code|atv}}: Heap file for attribute values. ==Value Indexes: <code>{{Code|txtl</code>}}, <code>{{Code|txtr</code>}}, <code>{{Code|atvl</code>}}, <code>{{Code|atvr</code>=}}==
'''Text Index:'''
* <code>{{Code|txtl</code>}}: Heap file with ID lists.* <code>{{Code|txtr</code>}}: Index file with references to ID lists.The '''Attribute Index''' is contained in the files <code>{{Code|atvl</code> }} and <code>{{Code|atvr</code>; it uses }}, the '''Token Index''' in {{Code|tokl}} and {{Code|tokr}}. All have the same layout. For a more detailed discussion and examples of these file formats please see [[Index File Structure]]. ==Document Path Index: {{Code|pth}}== Provides an index of all the document paths in the database. For databases with a large number of paths this file can be quite large so it is only generated the first time a function requesting a path lookup is run. For databases where path lookups are never used this file will not exist. '''Note:''' On Windows/Mac systems this file is case insensitive (all paths are lower case). On UNIX-like systems this file is case sensitive. The behaviour of path look ups will vary between systems. Copying this file between system types may lead to unexpected behaviour.
===Full-Text Fuzzy IndexID/Pre Mapping: <code>ftxx</code>, <code>ftxy</code>, <code>ftxz</code>={{Code|idp}}==
This file is only created if incremental indexing (UPDINDEX) is enabled for a database...will soon be reimplementedIt is used to provide a quick look up of the pre value for a database node id.
===Full-Text Trie Fuzzy Index: <code>ftxa</code>{{Code|ftxx}}, <code>ftxb</code>{{Code|ftxy}}, <code>ftxc</code>={{Code|ftxz}}==
...will may soon be dismissedreimplemented.
administrator, editor
33

edits

Navigation menu