Main Page » Advanced User Guide » Node Storage

Node Storage

This article gives some insight into how XML nodes are internally stored.

Node Table

XML nodes are stored in a flat table. The node table of a database can be displayed via the INFO STORAGE command:

$ basex -c"CREATE DB db <xml>HiThere</xml>" -c"INFO STORAGE"

PRE  DIS  SIZ  ATS  ID  NS  KIND  CONTENT
-----------------------------------------
  0    1    3    1   0   0  DOC   db.xml
  1    1    2    1   1   0  ELEM  xml
  2    1    1    1   2   0  TEXT  HiThere

PRE Value

The PRE value of a node represents the order in which the XML nodes are visited by a SAX parser. It is actually not stored in the database; instead, the table position implies it. As a result, it will change whenever a node with a smaller PRE value is added to or deleted from a database.

ID Value

Each database node has a persistent ID value, which remains valid after update operations, and which is referenced by the value indexes. As long as no updates are performed on a database, the PRE and ID values are identical and need not be stored. The values will remain identical as long as new nodes are exclusively added to the end of the database. Once nodes are deleted or inserted somewhere else, the values will diverge, as shown in the next example:

$ basex -c"CREATE DB db <xml>Hi</xml>" -q"insert node <b/> before /xml" -c"INFO STORAGE"

PRE  DIS  SIZ  ATS  ID  NS  KIND  CONTENT
-----------------------------------------
  0    1    4    1   0   0  DOC   db.xml
  1    1    1    1   3   0  ELEM  b
  2    2    2    1   1   0  ELEM  xml
  3    1    1    1   2   0  TEXT  Hi

The db:node-pre and db:node-id functions can be called to retrieve the PRE and ID values of a node, and db:get-pre and db:get-id can be used retrieve the node with the specified value. By default, ID lookups are expensive. If UPDINDEX is turned on, an additional index will be maintained to speed up the process.

Block Storage

The node table is organized as blocks with a fixed maximum number of records. The records within a block are sorted by their PRE value (which, therefore, can be implicitly determined and need not be saved). For each block, the smallest pre value within that block is stored. Since the records are sorted, it will be the PRE value of the first record stored in the block. Since the two maps will not grow excessively large, but are accessed/changed by each read/write operation, they are kept in main memory and only flushed to disk if required. In addition, a bit map tells which physical blocks are free and which not, so that when a new block is needed, an already free one will be reused.


⚡Generated with XQuery