CSV Module

This XQuery Module contains a single function to parse CSV input. CSV (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.

=Conventions=

All functions and errors in this module are assigned to the  namespace, which is statically bound to the csv prefix.

XML: Direct, Attributes
If the direct or attributes format is chosen, a CSV string is converted to XML:


 * The resulting XML document has a root element.
 * Rows are represented via elements.
 * Fields are represented via elements. The value of a field is represented as text node.
 * If the header option is set to true, the first text line is parsed as table header, and the elements are replaced with the field names:
 * Empty names are represented by a single underscore (_), and characters that are not valid in element names are replaced with underscores or (when invalid as first character of an element name) prefixed with an underscore.
 * If the lax option is set to false, invalid characters will be rewritten to an underscore and the character’s four-digit Unicode, and underscores will be represented as two underscores (__). The resulting element names may be less readable, but can always be converted back to the original field names.
 * If format is set to attributes, field names will be stored in name attributes.

A little advice: in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the Parsing tab, you can see the effects of some of the conversion options.

XQuery
With the xquery format, CSV records are converted to a sequence of arrays:


 * The resulting value will be a map with a records and an optional names key.
 * Records are organized as a sequence of arrays. A single array contains the entries of a single record.
 * The column names will be available if header option is set to true.

The CSV map can e.g. be accessed as follows:

 for $record at $pos in $csv?records return $pos || ". " || string-join($record?*, ', ')
 * returns all entries of the 5th record (row)
 * returns all entries of the 2nd field (column)
 * returns the names of all fields (if available)
 * Return enumerated strings for all records:

The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.

Options
In the following table, all available options are listed. The Excel column indicates what are the preferred options for data that is to be imported, or has been exported from Excel.

=Functions=

csv:serialize
=Examples=

Example 1: Converts CSV data to XML, interpreting the first row as table header:

Input addressbook.csv:  Name,First Name,Address,City Huber,Sepp,Hauptstraße 13,93547 Hintertupfing

Query:  let $text := file:read-text('addressbook.csv') return csv:parse($text, map { 'header': true })

Result:  Huber Sepp Hauptstraße 13 93547 Hintertupfing '''

Example 2: Converts some CSV data to XML and back, and checks if the input and output are equal. The expected result is true:

Query:  let $options := map { 'lax': false } let $input := file:read-text('some-data.csv') let $output := $input => csv:parse($options) => csv:serialize($options) return $input eq $output

Example 3: Converts CSV data to XQuery and returns distinct column values:

Query:  let $text := ``[Name,City Jack,Chicago Jack,Washington John,New York ]`` let $options := map { 'format': 'xquery', 'header': true } let $csv := csv:parse($text, $options) return ( 'Distinct values:',  let $records := $csv('records')  for $name at $pos in $csv('names')?*  let $values := $records?($pos)  return ( '* ' || $name || ': ' || string-join(distinct-values($values), ', ') ) )

Result:  Distinct values:
 * Name: Jack, John
 * City: Chicago, Washington, New York

=Errors=

=Changelog=


 * Version 9.1
 * Updated: csv:parse can be called with empty sequence.


 * Version 9.0


 * Added: xquery option
 * Removed: map option
 * Updated: error codes updated; errors now use the module namespace


 * Version 8.6


 * Updated: Options: improved Excel compatibility


 * Version 8.0


 * Added: backslashes option


 * Version 7.8


 * Updated: csv:parse now returns a document node instead of an element, or an XQuery map if format is set to map.
 * Added: format and lax options

The module was introduced with Version 7.7.2.