CSV Module

From BaseX Documentation

Jump to: navigation, search

This XQuery Module contains a single function to parse CSV input. CSV (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.

Contents

[edit] Conventions

Updated with Version 9.0:

All functions and errors in this module are assigned to the http://basex.org/modules/csv namespace, which is statically bound to the csv prefix.

[edit] Conversion

[edit] XML: Direct, Attributes

If the direct or attributes format is chosen, a CSV string is converted to XML:

A little advice: in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the Parsing tab, you can see the effects of some of the conversion options.

[edit] XQuery

This format has been introduced with Version 9.0. It is more flexible and light-weight than the old, discarded map format.

With the xquery format, CSV records are converted to a sequence of arrays:

The CSV map can e.g. be accessed as follows:

for $record at $pos in $csv?records
return $pos || ". " || string-join($record?*, ', ')

The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.

[edit] Options

In the following table, all available options are listed. The Excel column indicates what are the preferred options for data that is to be imported, or has been exported from Excel.

Option Description Allowed Default Excel
separator Defines the character which separates the values of a single record. comma, semicolon, colon, tab, space or a single character comma semicolon
header Indicates if the first line of the parsed or serialized CSV data is a table header. yes, no no
format Specifies the format of the XML data:
  • With direct conversion, field names are represented as element names
  • With attributes conversion, field names are stored in name attributes
  • With xquery conversion, the input is converted to an XQuery map
direct, attributes, xquery direct
lax Specifies if a lax approach is used to convert QNames to JSON names. yes, no yes no
quotes Specifies how quotes are parsed:
  • Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters. Separators and newlines within the quotes will be adopted without change.
  • Serialization: If the option is enabled, the value will be wrapped with quotes. A quote character in the value will be encoded according to the rules of the backslashes option.
yes, no yes yes
backslashes Specifies how quotes and other characters are escaped:
  • Parsing: If the option is enabled, \r, n and \t will be replaced with the corresponding control characters. All other escaped characters will be adopted as literals (e.g.: \""). If the option is disabled, two consecutive quotes will be replaced with a single quote (unless quotes is enabled and the quote is the first or last character of a value).
  • Serialization: If the option is enabled, \r, n, \t, " and the separator character will be encoded with a backslash. If the option is disabled, quotes will be duplicated.
yes, no no no

[edit] Functions

[edit] csv:parse

Signatures csv:parse($input as xs:string) as document-node(element(csv))
csv:parse($input as xs:string, $options as map(*)?) as item()
Summary Converts the CSV data specified by $input to an XML document or a map. The $options argument can be used to control the way the input is converted.
Errors parse: the input cannot be parsed.

[edit] csv:serialize

Signatures csv:serialize($input as item()?) as xs:string
csv:serialize($input as item()?, $options as map(*)?) as xs:string
Summary Serializes the specified $input as CSV, using the specified $options, and returns the result as string.

Values can also be serialized as CSV with the standard Serialization feature of XQuery:

  • The parameter method needs to be set to csv, and
  • the options presented in this article need to be assigned to the csv parameter.
Errors serialize: the input cannot be serialized.

[edit] Examples

Example 1: Converts CSV data to XML, interpreting the first row as table header:

Input addressbook.csv:

Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing

Query:

let $text := file:read-text('addressbook.csv')
return csv:parse($text, map { 'header': true() })

Result:

<csv>
  <record>
    <Name>Huber</Name>
    <First_Name>Sepp</First_Name>
    <Address>Hauptstraße 13</Address>
    <City>93547 Hintertupfing</City>
  </record>
</csv>

Example 2: Converts some CSV data to XML and back, and checks if the input and output are equal. The expected result is true:

Query:

let $options := map { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output

Example 3: Converts CSV data to XQuery and returns distinct column values:

Query:

let $text := ``[Name,City
Jack,Chicago
Jack,Washington
John,New York
]``
let $options := map { 'format': 'xquery', 'header': true() }
let $csv := csv:parse($text, $options)
return (
  'Distinct values:',
  let $records := $csv('records')
  for $name at $pos in $csv('names')?*
  let $values := $records($pos)
  return (
    '* ' || $name || ': ' || string-join(distinct-values($values), ', ')
  )
)

Result:

Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York

[edit] Errors

Updated with Version 9.0:

Code Description
parse The input cannot be parsed.
serialize The node cannot be serialized.

[edit] Changelog

Version 9.0
Version 8.6
Version 8.0
Version 7.8

The module was introduced with Version 7.7.2.

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export