Changes

Jump to navigation Jump to search
3,131 bytes added ,  00:52, 13 December 2020
Added information regarding pth.basex and idp.basex
=Version: 7This article is part of the [[Advanced User's Guide]]. It presents some low-level details on how data is stored in the database files.0=
==Data Types= The following data types are used for specifying the storage layout: {| class="wikitable" width="100%"|-! Type! Description! Example (native → hex integers)* |-| {{Type|Num}}: compressed | Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/Num.java Num.java]* | {{Code|15}} → {{Code|0F}}; {{Code|511}} → {{Code|41 FF}}<br/>|-| {{Type|Token}}: length | Length (<code>{{Type|Num</code>}}) and bytes of UTF8 byte representation* | {{Code|Hello}} → {{Code|05 48 65 6c 6c 6f}}|-| {{Type|doubleDouble}}: number| Number, stored as token* | {{Code|123}} → {{Code|03 31 32 33}}|-| {{Type|booleanBoolean}}: boolean | Boolean (1 byte, <code>{{Code|00</code> }} or <code>{{Code|01</code>}})* | {{Code|true}} → {{Code|01}}|-| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}| Arrays of values, introduced with the number of entries| {{Code|1,2}} → {{Code|02 01 31 01 32}}|-| {{Type|TokenSet}}: key | Key array (<code>{{Type|Tokens</code>}}), next/bucket/size arrays (<code>3x {{Type|Nums<}})||} =Database Files= The following tables illustrate the layout of the BaseX database files. All files are suffixed with {{Code|.basex}}. ==Meta Data, Name/Path/code>)Doc Indexes: {{Code|inf}}==
==inf.basex==
{| class="wikitable" width="100%"
|-
! Method
|-
| valign='top' | '''Disk 1. Meta Data'''| Database meta information1. Key/value pairs, in no particular order ({{Type|Token}}/{{Type|Token}}):<br/>&nbsp; &bull; Examples: {{Code|FNAME}}, {{Code|TIME}}, {{Code|SIZE}}, ...<br />&nbsp; &bull; {{Code|PERM}} → Number of users ({{Type|Num}}), and name/password/permission values for each user ({{Type|Token}}/{{Type|Token}}/{{Type|Num}})<br/>2. Empty key as finalizer| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/core/Users.java Users()]
|-
| valign='top' | '''12. Meta DataMain memory indexes'''| 1. Key/value pairs, suffixed by empty key in no particular order ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; {{Code|TAGS}} → Element Name Index<codebr />PERM&nbsp; &bull; {{Code|ATTS}} → Attribute Name Index<br /code> &nbsp; &bull; {{Code|PATH}} User PermissionsPath Index<br />&nbsp; &bull; {{Code| valign='top' NS}} → Namespaces<br />&nbsp; &bull; {{Code| [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/MetaData.java MetaData()]DOCS}} → Document Index<br/>2. Empty key as finalizer| [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/coredata/UsersDiskData.java UsersDiskData()]
|-
| valign='top' | '''2. Main memory indexesa) Name Index'''<br/>Element/attribute names| Key/value pairs1. Token set, suffixed by empty key storing all names ({{Type|TokenTokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|TokenNum}}):<br />&bull; <code>TAGS2.1.1. Number: min/max ({{Type|Doubles}})<br /code> → Tag Index2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />&bull; <code>ATTS2.2. Number of entries ({{Type|Num}})<br /code> → Attribute Index2.3. Leaf flag ({{Type|Boolean}})<br />&bull2.4. Maximum text length ({{Type|Double}}; <code>PATH<legacy, could be {{Type|Num}})| [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/code> → Path IndexNames.java Names()]<br />&bull; <code>NS<[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/hash/code> → NamespacesTokenSet.java TokenSet.read()]<br />&bull; <code>DOCS</code> → Document Index| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/dataindex/DiskDataStatsKey.java DiskDataStatsKey()]
|-
| valign='top' | '''2.1. Name b) Path Index'''<br/>Element/attribute names| 1. Token set, storing all names Flag for path definition ({{Type|TokenSetBoolean}}, always {{Code|true}}; legacy)<br />2. One StatsKey instance per entryPathNode:<br/>2.1. Content kind Name reference ({{Type|Num}})<br />2.12.1. Number: min/max Node kind ({{Type|DoublesNum}})<br />2.13.2. Category: number Number of entries occurrences ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.24. Number of entries children ({{Type|Num}})<br />2.35. Leaf flag ({{Type|BooleanDouble}}); legacy, can be reused or discarded<br />2.46. Maximum text length Recursive generation of child nodes ({{Type|Double}}; legacy, could be {{Type|Num}}→ 2)| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/Namespath/PathSummary.java NamesPathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master-core/src/main/java/org/basex/index/StatsKeypath/PathNode.java StatsKeyPathNode()]
|-
| valign='top' | '''2.2. Path Indexc) Namespaces'''| 1. Flag for path definition Token set, storing prefixes ({{Type|TokenSet}})<code>Boolean<br/code>2. Token set, always <code>true</code>; legacystoring URIs ({{Type|TokenSet}})<br/>23. PathNodeNSNode:<br/>23.1. Name reference pre value (<code>{{Type|Num</code>}})<br/>23.2. Node kind References to prefix/URI pairs (<code>Num</code>{{Type|Nums}})<br/>23.3. Number of occurrences children (<code>{{Type|Num</code>}})<br/>23.4. Number of children (<code>Num</code>)<br/>2.5. <code>Double</code>; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 23)| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/indexdata/path/PathSummaryNamespaces.java PathSummaryNamespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/pathdata/PathNodeNSNode.java PathNodeNSNode()]
|-
| valign='top' | '''2.3. Namespacesd) Document Index'''| 1. Token setArray of integers, storing prefixes (<code>TokenSet</code>)<br/>2. Token set, storing URIs (<code>TokenSet</code>)<br/>3. NSNode:<br/>3.1. representing the distances between all document pre value values (<code>Num</code>)<br/>3.2. References to prefix/URI pairs (<code>{{Type|Nums</code>)<br/>3.3. Number of children (<code>Num</code>}})<br/>3.4. Recursive generation of child nodes (→ 3)<code>| valign='top' | [https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master-core/src/main/java/org/basex/dataindex/NSNodeDocIndex.java NSNodeDocIndex()]
|}
 
==Node Table: {{Code|tbl}}, {{Code|tbli}}==
 
* {{Code|tbl}}: Main database table, stored in blocks.
* {{Code|tbli}}: Database directory, organizing the database blocks.
 
Some more information on the [[Node Storage|node storage]] is available.
 
==Texts: {{Code|txt}}, {{Code|atv}}==
 
* {{Code|txt}}: Heap file for text values (document names, string values of texts, comments and processing instructions)
* {{Code|atv}}: Heap file for attribute values.
 
==Value Indexes: {{Code|txtl}}, {{Code|txtr}}, {{Code|atvl}}, {{Code|atvr}}==
 
'''Text Index:'''
* {{Code|txtl}}: Heap file with ID lists.
* {{Code|txtr}}: Index file with references to ID lists.
The '''Attribute Index''' is contained in the files {{Code|atvl}} and {{Code|atvr}}, the '''Token Index''' in {{Code|tokl}} and {{Code|tokr}}. All have the same layout.
 
For a more detailed discussion and examples of these file formats please see [[Index File Structure]].
 
==Document Path Index: {{Code|pth}}==
 
Provides an index of all the document paths in the database. For databases with a large number of paths this file can be quite large so it is only generated the first time a function requesting a path lookup is run. For databases where path lookups are never used this file will not exist.
 
'''Note:''' On Windows/Mac systems this file is case insensitive (all paths are lower case). On UNIX-like systems this file is case sensitive. The behaviour of path look ups will vary between systems. Copying this file between system types may lead to unexpected behaviour.
 
==ID/Pre Mapping: {{Code|idp}}==
 
This file is only created if incremental indexing (UPDINDEX) is enabled for a database. It is used to provide a quick look up of the pre value for a database node id.
 
==Full-Text Fuzzy Index: {{Code|ftxx}}, {{Code|ftxy}}, {{Code|ftxz}}==
 
...may soon be reimplemented.
administrator, editor
33

edits

Navigation menu