CSV Functions
This module contains a single function to parse CSV input. CSV (comma-separated values) is a popular format for tabular data, exported e.g. from Excel.
Conventions
All functions and errors in this module are assigned to the http://basex.org/modules/csv namespace, which is statically bound to the csv prefix.
Conversion Formats
XML: Direct, Attributes
If the direct or attributes format is chosen, a CSV string is converted to XML:
- The resulting XML document has a
csvroot element. - Rows are represented via
recordelements. - Fields are represented via
entryelements. The value of a field is represented as text node. - If the
headeroption is set totrue, the first text line is parsed as table header:- If
formatis set todirect, the field names are encoded, as described in the Conversion Functions, and used as element names. - Otherwise, if
formatisattributes, the field names will be stored in name attributes.
- If
XQuery
With the xquery format, CSV records are converted to a sequence of arrays:
- The resulting value will be a map with a
recordsentry and an optionalnamesentry. - Records are organized as a sequence of arrays. An array contains the entries of a single record.
- The
namesentry contains an array with the column names. It is generated if theheaderoption is set totrue.
The CSV map can e.g. be accessed as follows:
$csv?records[5]returns all entries of the 5th record (row)$csv?records?(2)returns all entries of the 2nd field (column)$csv?names?*returns the names of all fields (if available)- Return enumerated strings for all records:
for $record at $pos in $csv?records
return $pos || ". " || string-join($record?*, ', ')
The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.
Options
In the following table, all available options are listed. The Excel column lists recommended options for data that is processed with Excel or Open/Libre Office.
| Option | Description | Allowed | Default | Direction | Excel |
|---|---|---|---|---|---|
separator |
Defines the character which separates the values of a single record. | comma, semicolon, colon, tab, space or a single character |
comma |
parse, serialize | semicolon or comma, depending on the region |
header |
Indicates if the first line of the parsed or serialized CSV data is a table header. | yes, no |
no |
parse, serialize | |
format |
Specifies the format for converting CSV data (see above). | direct, attributes, xquery |
direct |
parse, serialize | |
lax |
Specifies if lax conversion rules are used to convert QNames to JSON names. | yes, no |
yes |
parse, serialize | no |
quotes |
Specifies how quotes are parsed:
| yes, no |
yes |
parse, serialize | yes |
backslashes |
Specifies how quotes and other characters are escaped:
| yes, no |
no |
parse, serialize | no |
skip-empty |
Updated: Option added. Save memory by skipping entries in the result that would contain an empty string. Only gets effective for the formatsdirect or attribute, and if the header option is enabled. Note that skipping entries may cause columns to be lost in the serialised CSV output. |
yes, no |
no |
parse | no |
Functions
csv:doc
| Signature | csv:doc(
$href as xs:string?,
$options as map(*)? := {}
) as item()? | ||||
|---|---|---|---|---|---|
| Summary | Fetches the CSV document referred to by the given $href and converts it to an XQuery value. The $options argument can be used to control the way the input is converted. | ||||
| Errors |
|
csv:parse
| Signature | csv:parse(
$value as xs:string?,
$options as map(*)? := {}
) as item()? | ||
|---|---|---|---|
| Summary | Converts the CSV $value to an XQuery value. The $options argument can be used to control the way the input is converted. | ||
| Errors |
|
csv:serialize
| Signature | csv:serialize(
$input as item()?,
$options as map(*)? := {}
) as xs:string | ||
|---|---|---|---|
| Summary | Serializes the specified $input as CSV, using the specified $options, and returns the result as string.
Values can also be serialized as CSV with the standard Serialization feature of XQuery:
| ||
| Errors |
|
Examples
Example 1: Converts CSV data to XML, interpreting the first row as table header:
Inputaddressbook.csv
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
Query
let $text := file:read-text('addressbook.csv')
return csv:parse($text, { 'header': true() })
Result
<csv>
<record>
<Name>Huber</Name>
<First_Name>Sepp</First_Name>
<Address>Hauptstraße 13</Address>
<City>93547 Hintertupfing</City>
</record>
</csv>
Example 2: Convert some CSV data to XML and back, and check if the input and output are equal. The expected result is true:
let $options := { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
Example 3: Convert CSV data to XQuery and returns distinct column values:
Querylet $text := ``[Name,City
Jack,Chicago
Jack,Washington
John,New York
]``
let $options := { 'format': 'xquery', 'header': true() }
let $csv := csv:parse($text, $options)
return (
'Distinct values:',
let $records := $csv('records')
for $name at $pos in $csv('names')?*
let $values := $records?($pos)
return (
'* ' || $name || ': ' || string-join(distinct-values($values), ', ')
)
)
Result
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York
Errors
| Code | Description |
|---|---|
parse | The input cannot be parsed. |
serialize | The node cannot be serialized. |
Changelog
Version 11.0Version 9.7- Added: Options:
allowoption.
- Added:
csv:doc
- Updated:
csv:parsecan be called with empty sequence.
- Added:
xqueryoption - Updated: error codes updated; errors now use the module namespace
- Removed:
mapoption
- Updated: Options: improved Excel compatibility
- Added:
backslashesoption
- Added:
formatandlaxoptions - Updated:
csv:parsenow returns a document node instead of an element, or an XQuery map ifformatis set tomap.
- Added: New module added.