Main Page » XQuery » Functions » CSV Functions

CSV Functions

This module contains a single function to parse CSV input. CSV (comma-separated values) is a popular format for tabular data, exported e.g. from Excel.

Conventions

All functions and errors in this module are assigned to the http://basex.org/modules/csv namespace, which is statically bound to the csv prefix.

Conversion Formats

XML: Direct, Attributes

If the direct or attributes format is chosen, a CSV string is converted to XML:

  • The resulting XML document has a csv root element.
  • Rows are represented via record elements.
  • Fields are represented via entry elements. The value of a field is represented as text node.
  • If the header option is set to true, the first text line is parsed as table header:
    • If format is set to direct, the field names are encoded, as described in the Conversion Functions, and used as element names.
    • Otherwise, if format is attributes, the field names will be stored in name attributes.
A little advice: in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the Parsing tab, you can see the effects of some of the conversion options.

XQuery

With the xquery format, CSV records are converted to a sequence of arrays:

  • The resulting value will be a map with a records entry and an optional names entry.
  • Records are organized as a sequence of arrays. An array contains the entries of a single record.
  • The names entry contains an array with the column names. It is generated if the header option is set to true.

The CSV map can e.g. be accessed as follows:

  • $csv?records[5] returns all entries of the 5th record (row)
  • $csv?records?(2) returns all entries of the 2nd field (column)
  • $csv?names?* returns the names of all fields (if available)
  • Return enumerated strings for all records:
for $record at $pos in $csv?records
return $pos || ". " || string-join($record?*, ', ')

The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.

Options

In the following table, all available options are listed. The Excel column lists recommended options for data that is processed with Excel or Open/Libre Office.

Option Description Allowed Default Direction Excel
separator Defines the character which separates the values of a single record. comma, semicolon, colon, tab, space or a single character comma parse, serialize semicolon or comma, depending on the region
header Indicates if the first line of the parsed or serialized CSV data is a table header. yes, no no parse, serialize
format Specifies the format for converting CSV data (see above). direct, attributes, xquery direct parse, serialize
lax Specifies if lax conversion rules are used to convert QNames to JSON names. yes, no yes parse, serialize no
quotes Specifies how quotes are parsed:
  • Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters. Separators and newlines within the quotes will be adopted without change.
  • Serialization: If the option is enabled, the value will be wrapped with quotes if it contains characters that might be treated as control characters. A quote character in the value will be encoded according to the rules of the backslashes option.
yes, no yes parse, serialize yes
backslashes Specifies how quotes and other characters are escaped:
  • Parsing: If the option is enabled, \r, n and \t will be replaced with the corresponding control characters. All other escaped characters will be adopted as literals (e.g.: \""). If the option is disabled, two consecutive quotes will be replaced with a single quote (unless quotes is enabled and the quote is the first or last character of a value).
  • Serialization: If the option is enabled, \r, n, \t, " and the separator character will be encoded with a backslash. If the option is disabled, quotes will be duplicated.
yes, no no parse, serialize no
skip-empty Only gets effective for the formats direct or attribute, and if the header option is enabled. Please note that if this option is used and the data is serialized again, the resulting CSV header may be incomplete. yes, no no parse no

Functions

csv:doc

Signature
csv:doc(
  $source   as xs:string?,
  $options  as map(*)?     := {}
) as item()?
SummaryFetches the CSV document referred to by the given $source and converts it to an XQuery value. The $options argument can be used to control the way the input is converted.
Errors
options
parseThe input cannot be parsed.

csv:parse

Signature
csv:parse(
  $value    as xs:string?,
  $options  as map(*)?     := {}
) as item()?
SummaryConverts the CSV $value to an XQuery value. The $options argument can be used to control the way the input is converted.
Errors
parseThe input cannot be parsed.

csv:serialize

Signature
csv:serialize(
  $input    as item()?,
  $options  as map(*)?  := {}
) as xs:string
SummarySerializes the specified $input as CSV, using the specified $options, and returns the result as string. Values can also be serialized as CSV with the standard Serialization feature of XQuery:
  • The parameter method needs to be set to csv, and
  • the options presented in this article need to be assigned to the csv parameter.
Errors
serializeThe node cannot be serialized.

Examples

Example 1: Converts CSV data to XML, interpreting the first row as table header: Input addressbook.csv:
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
Query:
let $text := file:read-text('addressbook.csv')
return csv:parse($text, { 'header': true() })
Result:
<csv>
  <record>
    <Name>Huber</Name>
    <First_Name>Sepp</First_Name>
    <Address>Hauptstraße 13</Address>
    <City>93547 Hintertupfing</City>
  </record>
</csv>
''' Example 2: Convert some CSV data to XML and back, and check if the input and output are equal. The expected result is true: Query:
let $options := { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
Example 3: Convert CSV data to XQuery and returns distinct column values: Query:
let $text := ``[Name,City
Jack,Chicago
Jack,Washington
John,New York
]``
let $options := { 'format': 'xquery', 'header': true() }
let $csv := csv:parse($text, $options)
return (
  'Distinct values:',
  let $records := $csv('records')
  for $name at $pos in $csv('names')?*
  let $values := $records?($pos)
  return (
    '* ' || $name || ': ' || string-join(distinct-values($values), ', ')
  )
)
Result:
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York

Errors

CodeDescription
parseThe input cannot be parsed.
serializeThe node cannot be serialized.

Changelog

Version 11.0Version 9.7Version 9.4Version 9.1
  • Updated: csv:parse can be called with empty sequence.
Version 9.0
  • Added: xquery option
  • Updated: error codes updated; errors now use the module namespace
  • Removed: map option
Version 8.6
  • Updated: Options: improved Excel compatibility
Version 8.0
  • Added: backslashes option
Version 7.8
  • Added: format and lax options
  • Updated: csv:parse now returns a document node instead of an element, or an XQuery map if format is set to map.
Version 7.7.2
  • Added: New module added.

⚡Generated with XQuery