Serialization

From BaseX Documentation
Revision as of 21:36, 20 January 2017 by Keeleleek (talk | contribs) (added section on character mappings with small example from the mailing list)
Jump to navigation Jump to search

This page is part of the XQuery Portal. Serialization parameters define how XQuery items and XML nodes are textually output, i.e., serialized. (For input, see Parsers.) They have been formalized in the W3C XQuery Serialization 3.1 document. In BaseX, they can be specified by…

  • including them in the prolog of the XQuery expression,
  • specifying them in the XQuery functions file:write() or fn:serialize(). The serialization parameters are specified as
    • children of an <output:serialization-parameters/> element, as defined for the fn:serialize() function, or as
    • map, which contains all key/value pairs: map { "method": "xml", "cdata-section-elements": "div", ... },
  • using the -s flag of the BaseX command-line clients,
  • setting the SERIALIZER option before running a query,
  • setting the EXPORTER option before exporting a database, or
  • setting them as REST query parameters.

Parameters

The following table gives a brief summary of all serialization parameters recognized by BaseX. For details, please refer to official specification.

Parameter Description Allowed Default
method Specifies the serialization method. xml, xhtml, html, text, json, and adaptive are adopted from the official specification. The methods basex and csv are specific to BaseX (see XQuery Extensions). xml, xhtml, html, text, json, adaptive, csv, basex basex
version Specifies the version of the serialization method. xml/xhtml: 1.0, 1.1
html: 4.0, 4.01, 5.0
1.0
html-version Specifies the version of the HTML serialization method. 4.0, 4.01, 5.0 4.0
item-separator Determines a string to be used as item separator. If a separator is specified, the default separation of atomic values with single whitespaces will be skipped. arbitrary strings, \n, \r\n, \r empty
encoding Encoding to be used for outputting the data. all encodings supported by Java UTF-8
indent Adjusts whitespaces to make the output better readable. yes, no yes
cdata-section-elements List of elements to be output as CDATA, separated by whitespaces.
Example: <text><![CDATA[ <> ]]></text>
omit-xml-declaration Omits the XML declaration, which is serialized before the actual query result
Example: <?xml version="1.0" encoding="UTF-8"?>
yes, no yes
standalone Prints or omits the "standalone" attribute in the XML declaration. yes, no, omit omit
doctype-system Introduces the output with a document type declaration and the given system identifier.
Example: <!DOCTYPE x SYSTEM "entities.dtd">
doctype-public If doctype-system is specified, adds a public identifier.
Example: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
undeclare-prefixes Undeclares prefixes in XML 1.1. yes, no no
normalization-form Specifies a normalization form. BaseX supports Form C (NFC). NFC, none NFC
media-type Specifies the media type. application/xml
parameter-document Parses the value as XML document with additional serialization parameters (see the Serialization Specification for more details).
use-character-maps Defines character mappings. May only occur in documents parsed with parameter-document.
byte-order-mark Prints a byte-order-mark before starting serialization. yes, no no
escape-uri-attributes Escapes URI information in certain HTML attributes
Example: <a href="%C3%A4%C3%B6%C3%BC">äöü<a>
yes, no no
include-content-type Inserts a meta content-type element into the head element if the result is output as HTML
Example: <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head>. The head element must already exist or nothing will be added. Any existing meta content-type elements will be removed.
yes, no no

BaseX provides some additional serialization parameters:

Parameter Description Allowed Default
csv Defines the way how data is serialized as CSV. see CSV Module
json Defines the way how data is serialized as JSON. see JSON Module
tabulator Uses tab characters (\t) instead of spaces for indenting elements. yes, no no
indents Specifies the number of characters to be indented. positive number 2
newline Specifies the type of newline to be used as end-of-line marker. \n, \r\n, \r system dependent
limit Stops serialization after the specified number of bytes has been serialized. If a negative number is specified, everything will be output. positive number -1
binary Indicates if items of binary type are output in their native byte representation. Only applicable to the base serialization method. yes, no yes

The csv and json parameters are supplied with a list of options. Option names and values are combined with =, several options are separated by ,:

Query:

(: The output namespace declaration is optional, because it is statically declared in BaseX) :)
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "csv";
declare option output:csv "header=yes, separator=semicolon";
<csv>
  <record>
    <Name>John</Name>
    <City>Newton</City>
  </record>
  <record>
    <Name>Jack</Name>
    <City>Oldtown</City>
  </record>
</csv>

Result:

Name;City
John;Newton
Jack;Oldtown

Character mappings

Character maps allow a specific character in the instance of the data model to be replaced with a specified string of characters during serialization. The string that is substituted is output "as is," and the serializer performs no checks that the resulting document is well-formed. This may only occur in documents parsed with parameter-document.

This example maps the Unicode U+00A0 NO-BREAK SPACE as &#160; instead of the entity &nbsp;.

Example query:

(: The output namespace declaration is optional, because it is statically declared in BaseX) :)
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:parameter-document "map.xml";
<x>&#xA0;</x>

Example parameter-document:

<serialization-parameters
   xmlns="http://www.w3.org/2010/xslt-xquery-serialization">
   <use-character-maps>
     <character-map character="&#160;" map-string="&amp;#160;"/>
   </use-character-maps>
</serialization-parameters>

Changelog

Version 8.4
  • Added: Serialization parameter binary.
  • Updated: New serialization method basex. By default, items of binary type are now output in their native byte representation. The method raw was removed.
Version 8.0
  • Added: Support for use-character-maps and parameter-document.
  • Added: Serialization method adaptive.
  • Updated: adaptive is new default method (before: xml).
  • Removed: format, wrap-prefix, wrap-uri.
Version 7.8.2
  • Added: limit: Stops serialization after the specified number of bytes has been serialized.
Version 7.8
  • Added: csv and json serialization parameters.
  • Removed: separator option (use item-separator instead).
Version 7.7.2
  • Added: csv serialization method.
  • Added: temporary serialization methods csv-header, csv-separator, json-unescape, json-spec, json-format.
Version 7.5
  • Added: official item-separator and html-version parameter.
  • Updated: method=html5 removed; serializers updated with the latest version of the specification, using method=html and version=5.0.
Version 7.2
  • Added: separator parameter.
Version 7.1
  • Added: newline parameter.
Version 7.0
  • Added: Serialization parameters added to REST API; JSON/JsonML/raw methods.