Changes

Jump to navigation Jump to search
1,447 bytes added ,  19:06, 5 February 2016
==Data Types==* {{Type|Num}}: compressed integer (1This article is part of the [[Advanced User's Guide]]. It presents some low-5 bytes)* {{Type|Token}}: length ({{Type|Num}}) and bytes of UTF8 byte representation* {{Type|Double}}: number, level details on how data is stored as token* {{Type|Boolean}}: boolean (1 byte, <code>00</code> or <code>01</code>)* {{Type|TokenSet}}: key array (<code>Tokens</code>), next/bucket/size arrays (<code>Nums</code>)* {{Type|Nums}}, {{Type|Tokens}} and {{Type|Doubles}} are arrays of values, and introduced with in the number of entries ({{Type|Num}})database files.
==inf.basex=Data Types=
'''ContentsThe following data types are used for specifying the storage layout:''' Meta information on a  {| class="wikitable" width="100%"|-! Type! Description! Example (native → hex integers)|-| {{Type|Num}}| Compressed integer (1-5 bytes), specified in [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/Num.java Num.java]| {{Code|15}} → {{Code|0F}}; {{Code|511}} → {{Code|41 FF}}<br/>|-| {{Type|Token}}| Length ({{Type|Num}}) and bytes of UTF8 byte representation| {{Code|Hello}} → {{Code|05 48 65 6c 6c 6f}}|-| {{Type|Double}}| Number, stored as token| {{Code|123}} → {{Code|03 31 32 33}}|-| {{Type|Boolean}}| Boolean (1 byte, {{Code|00}} or {{Code|01}})| {{Code|true}} → {{Code|01}}|-| {{Type|Nums}}, {{Type|Tokens}}, {{Type|Doubles}}| Arrays of values, introduced with the number of entries| {{Code|1,2}} → {{Code|02 01 31 01 32}}|-| {{Type|TokenSet}}| Key array ({{Type|Tokens}}), next/bucket/size arrays (3x {{Type|Nums}})||} =Database Files= The following tables illustrate the layout of the BaseX database and main memory indexesfiles. All files are suffixed with {{Code|.basex}}==Meta Data, Name/Path/Doc Indexes: {{Code|inf}}==
{| class="wikitable" width="100%"
! Method
|-
| valign='top' | '''1. Meta Data'''| valign='top' | 1. Key/value pairs, suffixed by empty key in no particular order ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; Examples: {{Code|FNAME}}, {{Code|TIME}}, {{Code|SIZE}}, ...<codebr />&nbsp; &bull; {{Code|PERM}} → Number of users ({{Type|Num}}), and name/password/permission values for each user ({{Type|Token}}/{{Type|Token}}/{{Type|Num}})<br/code> → User Permissions2. Empty key as finalizer| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/DiskData.java DiskData()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/MetaData.java MetaData()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/core/Users.java Users()]
|-
| valign='top' | '''2. Main memory indexes'''| 1. Key/value pairs, suffixed by empty key in no particular order ({{Type|Token}}/{{Type|Token}}):<br />&nbsp; &bull; <code>{{Code|TAGS</code> }} Tag Element Name Index<br />&nbsp; &bull; <code>{{Code|ATTS</code> }} → Attribute Name Index<br />&nbsp; &bull; <code>{{Code|PATH</code> }} → Path Index<br />&nbsp; &bull; <code>{{Code|NS</code> }} → Namespaces<br />&nbsp; &bull; <code>{{Code|DOCS}} → Document Index<br/code> → Document Index2. Empty key as finalizer| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/DiskData.java DiskData()]
|-
| valign='top' | '''2.1. a) Name Index'''<br/>TagElement/attribute names
| 1. Token set, storing all names ({{Type|TokenSet}})<br />2. One StatsKey instance per entry:<br/>2.1. Content kind ({{Type|Num}}):<br />2.1.1. Number: min/max ({{Type|Doubles}})<br />2.1.2. Category: number of entries ({{Type|Num}}), entries ({{Type|Tokens}})<br />2.2. Number of entries ({{Type|Num}})<br />2.3. Leaf flag ({{Type|Boolean}})<br />2.4. Maximum text length ({{Type|Double}}; legacy, could be {{Type|Num}})
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/Names.java Names()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/hash/TokenSet.java TokenSet.read()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/StatsKey.java StatsKey()]
|-
| valign='top' | '''2.2. b) Path Index'''| 1. Flag for path definition ({{Type|Boolean}}, always <code>{{Code|true</code>}}; legacy)<br/>2. PathNode:<br/>2.1. Name reference ({{Type|Num}})<br/>2.2. Node kind ({{Type|Num}})<br/>2.3. Number of occurrences ({{Type|Num}})<br/>2.4. Number of children ({{Type|Num}})<br/>2.5. {{Type|Double}}; legacy, can be reused or discarded<br/>2.6. Recursive generation of child nodes (→ 2)| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/path/PathSummary.java PathSummary()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/path/PathNode.java PathNode()]
|-
| valign='top' | '''2.3. c) Namespaces'''
| 1. Token set, storing prefixes ({{Type|TokenSet}})<br/>2. Token set, storing URIs ({{Type|TokenSet}})<br/>3. NSNode:<br/>3.1. pre value ({{Type|Num}})<br/>3.2. References to prefix/URI pairs ({{Type|Nums}})<br/>3.3. Number of children ({{Type|Num}})<br/>3.4. Recursive generation of child nodes (→ 3)
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/Namespaces.java Namespaces()]<br/>[https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/NSNode.java NSNode()]
|-
| valign='top' | '''2.4. d) Document Index'''
| Array of integers, representing the distances between all document pre values ({{Type|Nums}})
| valign='top' | [https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/index/DocIndex.java DocIndex()]
|}
==(Node Table: {{Code|tbl}}, {{Code|tbli).basex== '''Contents:''' Main database table and directory. ==txt.basex== '''Contents:''' Heap file with text values (document names, string values of texts, comments and processing instructions). ==atv.basex}}==
'''Contents* {{Code|tbl}}:''' Heap file with attribute valuesMain database table, stored in blocks.* {{Code|tbli}}: Database directory, organizing the database blocks.
==(txtlSome more information on the [[Node Storage|txtr)node storage]] is available.basex==
'''Contents==Texts:''' Value index for texts.{{Code|txt}}, {{Code|atv}}==
==* {{Code|txt}}: Heap file for text values (atvldocument names, string values of texts, comments and processing instructions)* {{Code|atvr)atv}}: Heap file for attribute values.basex==
'''Contents==Value Indexes:''' Value index for attributes.{{Code|txtl}}, {{Code|txtr}}, {{Code|atvl}}, {{Code|atvr}}==
==(ftxa'''Text Index:'''* {{Code|ftxbtxtl}}: Heap file with ID lists.* {{Code|txtr}}: Index file with references to ID lists.The '''Attribute Index''' is contained in the files {{Code|ftxc)atvl}} and {{Code|atvr}}, the '''Token Index''' in {{Code|tokl}} and {{Code|tokr}}. All have the same layout.basex==
'''Contents:''' Trie full-text indexFor a more detailed discussion and examples of these file formats please see [[Index File Structure]].
==(Full-Text Fuzzy Index: {{Code|ftxx}}, {{Code|ftxy}}, {{Code|ftxz).basex}}==
'''Contents:''' Fuzzy full-text index...may soon be reimplemented.
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu