Difference between revisions of "CSV Module"

From BaseX Documentation
Jump to navigation Jump to search
Line 147: Line 147:
 
let $csv := csv:serialize($xml, $options)
 
let $csv := csv:serialize($xml, $options)
 
return $text eq $csv
 
return $text eq $csv
 +
</pre>
 +
 +
'''Example 3:''' Converts CSV data to an XQuery map item and serializes its contents:
 +
 +
'''Query:'''
 +
<pre class="brush:xquery">
 +
let $text := "Name;City" || out:nl() || "John;Newton"
 +
let $options := { 'separator': ';', 'format': 'map' }
 +
let $map := csv:parse($text, $options)
 +
return map:serialize($map)
 +
</pre>
 +
 +
'''Result:'''
 +
<pre class="brush:xml">
 +
{
 +
  1: ("Name", "City"),
 +
  2: ("John", "Newton")
 +
}
 
</pre>
 
</pre>
  

Revision as of 00:50, 20 October 2013

This XQuery Module contains a single function to parse CSV input. CSV (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.

Conventions

All functions in this module are assigned to the http://basex.org/modules/csv namespace, which is statically bound to the csv prefix.
All errors are assigned to the http://basex.org/errors namespace, which is statically bound to the bxerr prefix.

Conversion

XML: Direct, Attributes

CSV is converted to XML as follows:

  • The resulting XML document has a <csv> root element.
  • Rows are represented via <record> elements.
  • Fields are represented via <entry> elements. The value of a field is represented as text node.
  • If the header option is set to true, the first text line is parsed as table header, and the <entry> elements are replaced with the field names:
    • Empty names are represented by a single underscore (_), and characters that are not valid in element names are replaced with underscores.
    • If the lax option is set to false, invalid characters will be rewritten to an underscore and the character’s four-digit Unicode, and underscores will be represented as two underscores (__). The resulting element names may be less readable, but can always be converted back to the original field names.
  • If format is set to attributes, field names will be stored in name attributes.

In the Database Creation dialog of the GUI, when the CSV parser is selected, the Parsing tab demonstrates the conversion of CSV to XML and the effects of the single conversion options.

Map

If format is set to map, the CSV data will be converted to an XQuery map:

  • All records are enumerated with positive integers.
  • By default, all entries of a records are represented in a sequence.
  • If the header option is set to true, a map is created, which contains all field names and its values.

Options

The following options are available:

Option Description Allowed Default
separator Defines the character which separates the entries of a record in a single line. comma, semicolon, colon, tab, space or a single character comma
header Indicates if the first line of the parsed or serialized CSV data is a table header. yes, no no
format Specifies the format of the XML data. The format is only relevant if the header option is activated:
  • With direct conversion, field names are represented as element names
  • With attributes conversion, field names are stored in name attributes
  • With map conversion, the input is converted to an XQuery map
direct, attributes, map direct
lax Specifies if a lax approach is used to convert QNames to JSON names. yes, no yes

The CSV function signatures provide an $options argument. Options can either be specified

  • as children of an <csv:options/> element; e.g.:
<csv:options>
  <csv:separator value=';'/>
  ...
</csv:options>
  • or as map, which contains all key/value pairs:
{ 'separator': ';', ... }

Functions

csv:parse

Version 7.8: the return type has been changed from element(<csv>) to document-node(element(<csv>)), and the format and lax options have been added.

Signatures csv:parse($input as xs:string) as document-node(element(csv))
csv:parse($input as xs:string, $options as item()) as document-node(element(csv))
Summary Converts the CSV data specified by $input to XML, and returns the result as <csv/> value. The $options argument can be used to control the way the input is converted.
Errors BXCS0001: the input cannot be parsed.

csv:serialize

Signatures csv:serialize($input as node(), $options as item()) as xs:string
Summary Serializes the node specified by $input as CSV data, and returns the result as xs:string.
XML documents can also be serialized as CSV if the Serialization parameter method is set to csv.
With the $options argument, the way the node is serialized can be controlled.
Errors BXCS0002: the input cannot be serialized.

Examples

Example 1: Converts CSV data to XML, interpreting the first row as table header:

Input addressbook.csv:

Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing

Query:

let $text := file:read-text('addressbook.csv')
return csv:parse($text, { 'header': 'true' })

Result:

<csv>
  <record>
    <Name>Huber</Name>
    <First_Name>Sepp</First_Name>
    <Address>Hauptstraße 13</Address>
    <City>93547 Hintertupfing</City>
  </record>
</csv>

Example 2: Converts some CSV data to XML and back, and checks if the input and output are equal. The expected result is true:

Query:

let $text := file:read-text('some-data.csv')
let $options := { 'lax': 'no' }
let $xml := csv:parse($text, $options)
let $csv := csv:serialize($xml, $options)
return $text eq $csv

Example 3: Converts CSV data to an XQuery map item and serializes its contents:

Query:

let $text := "Name;City" || out:nl() || "John;Newton"
let $options := { 'separator': ';', 'format': 'map' }
let $map := csv:parse($text, $options)
return map:serialize($map)

Result:

{
  1: ("Name", "City"),
  2: ("John", "Newton")
}

Errors

Code Description
BXCS0001 The input cannot be parsed.
BXCS0002 The node cannot be serialized.

Changelog

Version 7.8
  • Updated: return type of csv:parse changed from element(<csv>) to document-node(element(<csv>))
  • Added: format and lax options

The module was introduced with Version 7.7.2.