Difference between revisions of "CSV Module"

From BaseX Documentation
Jump to navigation Jump to search
Line 8: Line 8:
 
=Rules=
 
=Rules=
  
{{Version|7.7.2}}: the conversion rules have been updated and aligned with the JSON parser:
+
{{Version|7.7.2}}: the conversion rules have been updated and aligned with the JSON parser.
  
The conversion of CSV data is based on the following rules:
+
CSV is converted to XML as follows:
  
# The resulting document has a {{Code|<csv/>}} root node.  
+
# The resulting XML document has a {{Code|<csv/>}} root elements.
# Rows are represented via {{Code|<record/>}} nodes.  
+
# Rows are represented via {{Code|<record/>}} elements.
# Fields are either named {{Code|entry}} or (if the CSV header is parsed) named by the corresponding column name:
+
# Fields are represented via {{Code|<entry/>}} elements. The value of a field is represented as text node.
## Empty field names are represented by a single underscore ({{Code|&lt;_&gt;...&lt;/_&gt;}}).
+
# If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header, and the {{Code|entry}} elements are replaced with the field names:
## Underscore characters are rewritten to two underscores ({{Code|__}}).
+
## Empty names are represented by a single underscore ({{Code|_}}), and characters that are not valid in element names are replaced with underscores.
## A character that cannot be represented as NCName character is rewritten to an underscore and its four-digit Unicode.
+
## If the {{Code|lax}} option is set to {{Code|false}}, invalid characters will be rewritten to an underscore and the character’s four-digit Unicode, and underscores will be represented as two underscores ({{Code|__}}). The resulting element names may be less readable, but can always be converted back to the original field names.
 +
# If {{Code|format}} is set to {{Code|attributes}}, field names will be stored in name attributes.
 +
 
 +
If the JSON parser is selected in the Database Creation dialog of the GUI, a simple example is displayed to show the effects of the available options.
  
 
=Functions=
 
=Functions=

Revision as of 23:43, 17 October 2013

This XQuery Module contains a single function to parse CSV input. CSV (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.

Conventions

All functions in this module are assigned to the http://basex.org/modules/csv namespace, which is statically bound to the csv prefix.
All errors are assigned to the http://basex.org/errors namespace, which is statically bound to the bxerr prefix.

Rules

Version 7.7.2: the conversion rules have been updated and aligned with the JSON parser.

CSV is converted to XML as follows:

  1. The resulting XML document has a <csv/> root elements.
  2. Rows are represented via <record/> elements.
  3. Fields are represented via <entry/> elements. The value of a field is represented as text node.
  4. If the header option is set to true, the first text line is parsed as table header, and the entry elements are replaced with the field names:
    1. Empty names are represented by a single underscore (_), and characters that are not valid in element names are replaced with underscores.
    2. If the lax option is set to false, invalid characters will be rewritten to an underscore and the character’s four-digit Unicode, and underscores will be represented as two underscores (__). The resulting element names may be less readable, but can always be converted back to the original field names.
  5. If format is set to attributes, field names will be stored in name attributes.

If the JSON parser is selected in the Database Creation dialog of the GUI, a simple example is displayed to show the effects of the available options.

Functions

csv:parse

Signatures csv:parse($input as xs:string) as element(csv)
csv:parse($input as xs:string, $options as item()) as element(csv)
Summary Converts the CSV data specified by $input to XML, and returns the result as element(csv) value.
The $options argument can be used to control the way the input is converted. The following options are available:
  • separator defines the character which separates columns in a row. By default, this is a comma (,).
  • header specifies if the input contains a header row. The default value is false.

Options can either be specified

  • as children of an <csv:options/> element; e.g.:
<csv:options>
  <csv:separator value=';'/>
  ...
</csv:options>
  • or as map, which contains all key/value pairs:
{ 'separator' : ';', ... }
Errors BXCS0001: the input cannot be converted.
BXCS0003: the specified separator must be a single character.

csv:serialize

Signatures csv:serialize($input as node(), $options as item()) as xs:string
Summary Serializes the node specified by $input as CSV data, and returns the result as xs:string.
XML documents can also be serialized as CSV if the Serialization Option method is set to csv.
The $options argument can be used to control the way the node is serialized. The following options are available:
  • separator defines the character which separates columns in a row. By default, this is a comma (,).
  • header specifies if the input element names are to be interpreted as header names. The default value is false.

Options can either be specified

  • as children of an <csv:options/> element; e.g.:
<csv:options>
  <csv:separator value=';'/>
  ...
</csv:options>
  • or as map, which contains all key/value pairs:
{ 'separator' : ';', ... }
Errors BXCS0002: the input cannot be serialized.
BXCS0003: the specified separator must be a single character.

Errors

Code Description
BXCS0001 The input cannot be converted.
BXCS0002 The node cannot be serialized.
BXCS0001 The specified separator must be a single character.

Changelog

The module was introduced with Version 7.7.2.