Changes

Jump to navigation Jump to search
464 bytes removed ,  16:55, 10 April 2019
no edit summary
This page is provided to help those who are interested in the specific file format of the index files used by BaseX. It was predominantly written to aid research into the reasons for ever increasing file size when using the <code>[[Options#UPDINDEX{{Option|UPDINDEX]]</code> }} option.
== Attribute Index Files ==
*The total number of attribute values
*The number of times each attribute value appears
*The <code>ID</code> numbers of offsets for each occurrence of each attribute value
In the example below we have the file for the database used in the atrv.basex example above. The first four bytes provide a big-endian integer value of the total number of different attribute values in the index - in this case 4.
The remainder of the file is made up of ID lists. Each list starts on one of the bytes from atvr.basex - in the case of our example there is a list starting on byte at position 8 (counting starting from 0). The first item in the list is a count of the number of attributes will this value - in our case here it's 1. Then the list has the locations of the attributes - in our case there is only one attribute and it's at a position 8. This means that it is offset 8 positions from the beginning of the database (use [[OptionsCommands#InfoINFO_STORAGE|INFO STORAGE]] command to view the order).
<code>00 00 00 04</code><code>[01] 02</code><code>[01] 05</code>'''<code>[01] 08</code>'''<code>[01] 0B</code>
<span style="background-color:#F4A460"><code>[02] 0B 03</code></span>
The header tells us that there are 4 attribute values but we can see there are 5 ID lists in the file. One has become orphaned - a new longer list was required to include the newly added attribute and has been appended to the end of the file.
In versions of BaseX prior to 8.0 when items are deleted and a shorter list is required it will be updated in place. When items are added and a longer list is required the new list is always added at the end of the file. Over a period of time the file will grow - running the [[Options#OPTIMIZE|OPTIMIZE]] command will recreate the index from scratch and recover the lost space. From BaseX 8.0 some optimisations have been applies so that while While a database is open , a list of free spaces is maintained and a new list will only be added to the end of the file if there isn't a free space available that is large enough. However , this list of free spaces is lost when the database is closed and future operations will not be aware of any free space available when the database is opened. This, and the fact that small spaces are unlikely to be filled (single bytes for example) mean that the index file may still grow larger than it needs to be. This space can be recovered, as before, by running [[OptionsCommands#OPTIMIZE|OPTIMIZE]].
== Value Index Files ==
These files, txtr.basex and txtl.basex work in the same way as the attribute index files but with references to the text nodes instead of attributes.
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu