Changes

Jump to navigation Jump to search
7,528 bytes added ,  18:51, 18 November 2020
This [[Module Library|XQuery Module]] contains a single function to parse CSV input. [httphttps://en.wikipedia.org/wiki/Comma-separated_values CSV] (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/csv}} </nowiki></code> namespace, which is statically bound to the {{Code|csv}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
==Conversion== ===XML: Direct, Attributes=== If the {{Code|direct}} or {{Code|attributes}} format is chosen, a CSV string is converted to XML: * The resulting XML document has a {{Code|csv}} root element.* Rows are represented via {{Code|record}} elements.* Fields are represented via {{Code|entry}} elements. The value of a field is represented as text node.* If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header:** If {{Code|format}} is set to {{Code|direct}}, the field names are encoded, as described in the [[Conversion Module#Keys|Conversion Module]], and used as element names.** Otherwise, if {{Code|format}} is {{Code|attributes}}, the field names will be stored in name attributes. '''A little advice''': in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the ''Parsing'' tab, you can see the effects of some of the conversion options. ===XQuery=== With the {{Code|xquery}} format, CSV records are converted to a sequence of arrays: * The resulting value will be a map with a {{Code|records}} and an optional {{Code|names}} key.* Records are organized as a sequence of arrays. A single array contains the entries of a single record.* The column names will be available if {{Code|header}} option is set to {{Code|true}}. The CSV map can e.g. be accessed as follows: * <code>$csv?records[5]</code> returns all entries of the 5th record (row)* <code>$csv?records(2)</code> returns all entries of the 2nd field (column)* <code>$csv?names?*</code> returns the names of all fields (if available)* Return enumerated strings for all records:<syntaxhighlight lang="xquery">for $record at $pos in $csv?recordsreturn $pos || ". " || string-join($record?*, ', ')</syntaxhighlight> The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing. ==Options== In the following table, all available options are listed. The Excel column indicates what are the preferred options for data that is to be imported, or has been exported from Excel. {| class="wikitable sortable" width="100%"|- valign="top"! width="140" | Option! width="50%" | Description! Allowed! Default! Excel|- valign="top"| {{Code|separator}}| Defines the character which separates the values of a single record.| {{Code|comma}}, {{Code|semicolon}}, {{Code|colon}}, {{Code|tab}}, {{Code|space}} or a ''single character''| {{Code|comma}}| {{Code|semicolon}}|- valign="top"| {{Code|header}}| Indicates if the first line of the parsed or serialized CSV data is a table header.| {{Code|yes}}, {{Code|no}}| {{Code|no}}||- valign="top"| {{Code|format}}| Specifies the format of the XML data:<br/>* With {{Code|direct}} conversion, field names are represented as element names* With {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes* With {{Code|xquery}} conversion, the input is converted to an XQuery map| {{Code|direct}}, {{Code|attributes}}, {{Code|xquery}}| {{Code|direct}}||- valign="top"| {{Code|lax}}| Specifies if a lax approach is used to convert QNames to JSON names.| {{Code|yes}}, {{Code|no}}| {{Code|yes}}| {{Code|no}}|- valign="top"| {{Code|quotes}}| Specifies how quotes are parsed:* Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters. Separators and newlines within the quotes will be adopted without change.* Serialization: If the option is enabled, the value will be wrapped with quotes if it contains characters that might be treated as control characters. A quote character in the value will be encoded according to the rules of the {{Code|backslashes}} option.| {{Code|yes}}, {{Code|no}}| {{Code|yes}}| {{Code|yes}}|- valign="top"| {{Code|backslashes}}| Specifies how quotes and other characters are escaped:* Parsing: If the option is enabled, {{Code|\r}}, {{Code|n}} and {{Code|\t}} will be replaced with the corresponding control characters. All other escaped characters will be adopted as literals (e.g.: {{Code|\"}} → {{Code|"}}). If the option is disabled, two consecutive quotes will be replaced with a single quote (unless {{Code|quotes}} is enabled and the quote is the first or last character of a value).* Serialization: If the option is enabled, {{Code|\r}}, {{Code|n}}, {{Code|\t}}, {{Code|"}} and the separator character will be encoded with a backslash. If the option is disabled, quotes will be duplicated.| {{Code|yes}}, {{Code|no}}| {{Code|no}}| {{Code|no}}|} =Functions=
==csv:parse==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:parse|$input string as xs:string?|elementitem(csv)?}}<br/>{{Func|csv:parse|$input string as xs:string?, $options as itemmap(*)?|elementitem(csv)?}}
|-
| '''Summary'''
|Converts the CSV data specified by {{Code|$inputstring}} to XML, and returns the result as {{Code|element(csv)}} an XQuery value.<br/>The {{Code|$options}} argument can be used to control the way the input is converted. The following options are available:* {{Code|separator}} defines the character which separates columns in a row. By default, this is a comma ({{Code-|,}}).'''Errors'''* |{{CodeError|parse|headers#Errors}} specifies if the specified input contains a header row. The default value is {{Code|false}}.Options can either cannot be specified<br />* parsed as children of an {{Code|<csv:options/>}} element; eCSV document.g.:<pre class="brush:xml"><csv:options> <csv:separator value=';'/> ...</csv:options></pre>* or as map, which contains all key/value pairs:<pre class="brush:xquery">{ 'separator' : ';', ... |}</pre>
==csv:doc==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:doc|$uri as xs:string?|item()?}}<br />{{Func|csv:doc|$uri as xs:string?, $options as map(*)?|item()?}}<br />
|-
| '''Summary'''
|Fetches the CSV document referred to by the given {{Code|$uri}} and converts it to an XQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.
|-
| '''Errors'''
|{{Error|BXCS0001parse|#Errors}} the specified separator must input cannot be a single characterparsed as CSV document.<br/>{{Error|options|#Errors}} the specified options are conflicting.
|}
 
==csv:serialize==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:serialize|$input as item()?|xs:string}}<br/>{{Func|csv:serialize|$input as item()?, $options as map(*)?|xs:string}}
|-
| '''Summary'''
|Serializes the specified {{Code|$input}} as CSV, using the specified {{Code|$options}}, and returns the result as string.
Values can also be serialized as CSV with the standard [[Serialization]] feature of XQuery:
* The parameter {{Code|method}} needs to be set to {{Code|csv}}, and
* the options presented in this article need to be assigned to the {{Code|csv}} parameter.
|-
| '''Errors'''
|{{Error|serialize|#Errors}} the input cannot be serialized.
|}
 
=Examples=
 
'''Example 1:''' Converts CSV data to XML, interpreting the first row as table header:
 
'''Input''' {{Code|addressbook.csv}}:
<syntaxhighlight lang="xml">
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
</syntaxhighlight>
 
'''Query:'''
<syntaxhighlight lang="xquery">
let $text := file:read-text('addressbook.csv')
return csv:parse($text, map { 'header': true() })
</syntaxhighlight>
 
'''Result:'''
<syntaxhighlight lang="xml">
<csv>
<record>
<Name>Huber</Name>
<First_Name>Sepp</First_Name>
<Address>Hauptstraße 13</Address>
<City>93547 Hintertupfing</City>
</record>
</csv>
</syntaxhighlight>
'''
 
'''Example 2:''' Converts some CSV data to XML and back, and checks if the input and output are equal. The expected result is {{Code|true}}:
 
'''Query:'''
<syntaxhighlight lang="xquery">
let $options := map { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
</syntaxhighlight>
 
'''Example 3:''' Converts CSV data to XQuery and returns distinct column values:
 
'''Query:'''
<syntaxhighlight lang="xquery">
let $text := ``[Name,City
Jack,Chicago
Jack,Washington
John,New York
]``
let $options := map { 'format': 'xquery', 'header': true() }
let $csv := csv:parse($text, $options)
return (
'Distinct values:',
let $records := $csv('records')
for $name at $pos in $csv('names')?*
let $values := $records?($pos)
return (
'* ' || $name || ': ' || string-join(distinct-values($values), ', ')
)
)
</syntaxhighlight>
 
'''Result:'''
<syntaxhighlight lang="xquery">
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York
</syntaxhighlight>
=Errors=
|Description
|-
|{{Code|BXCS0001parse}}| The specified separator must input cannot be a single characterparsed.|-|{{Code|serialize}}| The node cannot be serialized.
|}
=Changelog=
 
;Version 9.4
* Added: [[#csv:doc|csv:doc]]
 
; Version 9.1
* Updated: [[#csv:parse|csv:parse]] can be called with empty sequence.
 
;Version 9.0
 
* Added: {{Code|xquery}} option
* Removed: {{Code|map}} option
* Updated: error codes updated; errors now use the module namespace
 
;Version 8.6
 
* Updated: [[#Options|Options]]: improved Excel compatibility
 
;Version 8.0
 
* Added: {{Code|backslashes}} option
 
;Version 7.8
 
* Updated: [[#csv:parse|csv:parse]] now returns a document node instead of an element, or an XQuery map if {{Code|format}} is set to {{Code|map}}.
* Added: {{Code|format}} and {{Code|lax}} options
The module was introduced with Version 7.7.2.
 
[[Category:XQuery]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu