CSV Functions
This module contains a single function to parse CSV input. CSV (comma-separated values) is a popular format for tabular data, exported e.g. from Excel.
Conventions
All functions and errors in this module are assigned to the http://basex.org/modules/csv
namespace, which is statically bound to the csv
prefix.
Conversion Formats
XML: Direct, Attributes
If the direct
or attributes
format is chosen, a CSV string is converted to XML:
- The resulting XML document has a
csv
root element. - Rows are represented via
record
elements. - Fields are represented via
entry
elements. The value of a field is represented as text node. - If the
header
option is set totrue
, the first text line is parsed as table header:- If
format
is set todirect
, the field names are encoded, as described in the Conversion Functions, and used as element names. - Otherwise, if
format
isattributes
, the field names will be stored in name attributes.
- If
XQuery
With the xquery
format, CSV records are converted to a sequence of arrays:
- The resulting value will be a map with a
records
entry and an optionalnames
entry. - Records are organized as a sequence of arrays. An array contains the entries of a single record.
- The
names
entry contains an array with the column names. It is generated if theheader
option is set totrue
.
The CSV map can e.g. be accessed as follows:
$csv?records[5]
returns all entries of the 5th record (row)$csv?records?(2)
returns all entries of the 2nd field (column)$csv?names?*
returns the names of all fields (if available)- Return enumerated strings for all records:
for $record at $pos in $csv?records
return $pos || ". " || string-join($record?*, ', ')
The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.
Options
In the following table, all available options are listed. The Excel column lists recommended options for data that is processed with Excel or Open/Libre Office.
Option | Description | Allowed | Default | Direction | Excel |
---|---|---|---|---|---|
separator |
Defines the character which separates the values of a single record. | comma , semicolon , colon , tab , space or a single character |
comma |
parse, serialize | semicolon or comma , depending on the region |
header |
Indicates if the first line of the parsed or serialized CSV data is a table header. | yes , no |
no |
parse, serialize | |
format |
Specifies the format for converting CSV data (see above). | direct , attributes , xquery |
direct |
parse, serialize | |
lax |
Specifies if lax conversion rules are used to convert QNames to JSON names. | yes , no |
yes |
parse, serialize | no |
quotes |
Specifies how quotes are parsed:
| yes , no |
yes |
parse, serialize | yes |
backslashes |
Specifies how quotes and other characters are escaped:
| yes , no |
no |
parse, serialize | no |
skip-empty |
Updated: Option added. Save memory by skipping entries in the result that would contain an empty string. Only gets effective for the formatsdirect or attribute , and if the header option is enabled. Note that skipping entries may cause columns to be lost in the serialised CSV output. |
yes , no |
no |
parse | no |
Functions
csv:doc
Signature | csv:doc( $href as xs:string?, $options as map(*)? := {} ) as item()? | ||||
---|---|---|---|---|---|
Summary | Fetches the CSV document referred to by the given $href and converts it to an XQuery value. The $options argument can be used to control the way the input is converted. | ||||
Errors |
|
csv:parse
Signature | csv:parse( $value as xs:string?, $options as map(*)? := {} ) as item()? | ||
---|---|---|---|
Summary | Converts the CSV $value to an XQuery value. The $options argument can be used to control the way the input is converted. | ||
Errors |
|
csv:serialize
Signature | csv:serialize( $input as item()?, $options as map(*)? := {} ) as xs:string | ||
---|---|---|---|
Summary | Serializes the specified $input as CSV, using the specified $options , and returns the result as string.
Values can also be serialized as CSV with the standard Serialization feature of XQuery:
| ||
Errors |
|
Examples
Example 1: Converts CSV data to XML, interpreting the first row as table header: Inputaddressbook.csv
:
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
Query:
let $text := file:read-text('addressbook.csv')
return csv:parse($text, { 'header': true() })
Result:
<csv>
<record>
<Name>Huber</Name>
<First_Name>Sepp</First_Name>
<Address>Hauptstraße 13</Address>
<City>93547 Hintertupfing</City>
</record>
</csv>
Example 2: Convert some CSV data to XML and back, and check if the input and output are equal. The expected result is true
:
Query:
let $options := { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
Example 3: Convert CSV data to XQuery and returns distinct column values:
Query:
let $text := ``[Name,City
Jack,Chicago
Jack,Washington
John,New York
]``
let $options := { 'format': 'xquery', 'header': true() }
let $csv := csv:parse($text, $options)
return (
'Distinct values:',
let $records := $csv('records')
for $name at $pos in $csv('names')?*
let $values := $records?($pos)
return (
'* ' || $name || ': ' || string-join(distinct-values($values), ', ')
)
)
Result:
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York
Errors
Code | Description |
---|---|
parse | The input cannot be parsed. |
serialize | The node cannot be serialized. |
Changelog
Version 11.0Version 9.7- Added: Options:
allow
option.
- Added:
csv:doc
- Updated:
csv:parse
can be called with empty sequence.
- Added:
xquery
option - Updated: error codes updated; errors now use the module namespace
- Removed:
map
option
- Updated: Options: improved Excel compatibility
- Added:
backslashes
option
- Added:
format
andlax
options - Updated:
csv:parse
now returns a document node instead of an element, or an XQuery map ifformat
is set tomap
.
- Added: New module added.