Changes

Jump to navigation Jump to search
5,435 bytes added ,  18:51, 18 November 2020
This [[Module Library|XQuery Module]] contains a single function to parse CSV input. [httphttps://en.wikipedia.org/wiki/Comma-separated_values CSV] (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/csv}} </nowiki></code> namespace, which is statically bound to the {{Code|csv}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Rules=Conversion==
{{Version|7.7.2}}: the conversion rules have been updated and aligned with the JSON parser===XML:Direct, Attributes===
The conversion of If the {{Code|direct}} or {{Code|attributes}} format is chosen, a CSV data string is based on the following rulesconverted to XML:
# * The resulting XML document has a {{Code|<csv/>}} root nodeelement. # * Rows are represented via {{Code|<record/>}} nodeselements. # * Fields are either named represented via {{Code|entry}} or (elements. The value of a field is represented as text node.* If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header:** If {{Code|format}} is set to {{Code|direct}}, the field names are encoded, as described in the [[Conversion Module#Keys|Conversion Module]], and used as element names.** Otherwise, if {{Code|format}} is {{Code|attributes}}, the field names will be stored in name attributes. '''A little advice''': in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the ''Parsing'' tab, you can see the effects of some of the conversion options. ===XQuery=== With the {{Code|xquery}} format, CSV records are converted to a sequence of arrays: * The resulting value will be a map with a {{Code|records}} and an optional {{Code|names}} key.* Records are organized as a sequence of arrays. A single array contains the entries of a single record.* The column names will be available if {{Code|header }} option is parsedset to {{Code|true}}. The CSV map can e.g. be accessed as follows: * <code>$csv?records[5]</code> returns all entries of the 5th record (row)* <code>$csv?records(2)</code> returns all entries of the 2nd field (column)* <code>$csv?names?*</code> returns the names of all fields (if available)* Return enumerated strings for all records:<syntaxhighlight lang="xquery">for $record at $pos in $csv?recordsreturn $pos || ". " || string-join($record?*, ', ') named by </syntaxhighlight> The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing. ==Options== In the corresponding following table, all available options are listed. The Excel column nameindicates what are the preferred options for data that is to be imported, or has been exported from Excel. {| class="wikitable sortable" width="100%"|- valign="top"! width="140" | Option! width="50%" | Description! Allowed! Default! Excel|- valign="top"| {{Code|separator}}| Defines the character which separates the values of a single record.| {{Code|comma}}, {{Code|semicolon}}, {{Code|colon}}, {{Code|tab}}, {{Code|space}} or a ''single character''| {{Code|comma}}| {{Code|semicolon}}|- valign="top"| {{Code|header}}| Indicates if the first line of the parsed or serialized CSV data is a table header.| {{Code|yes}}, {{Code|no}}| {{Code|no}}||- valign="top"| {{Code|format}}| Specifies the format of the XML data:<br/>## Empty * With {{Code|direct}} conversion, field names are represented by as element names* With {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes* With {{Code|xquery}} conversion, the input is converted to an XQuery map| {{Code|direct}}, {{Code|attributes}}, {{Code|xquery}}| {{Code|direct}}||- valign="top"| {{Code|lax}}| Specifies if a single underscore (lax approach is used to convert QNames to JSON names.| {{Code|yes}}, {{Code|no}}| {{Code|yes}}| {{Code|no}}|- valign="top"| {{Code|&lt;_&gt;quotes}}| Specifies how quotes are parsed:* Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters.Separators and newlines within the quotes will be adopted without change.* Serialization: If the option is enabled, the value will be wrapped with quotes if it contains characters that might be treated as control characters.&lt;/_&gt;A quote character in the value will be encoded according to the rules of the {{Code|backslashes}})option.## Underscore | {{Code|yes}}, {{Code|no}}| {{Code|yes}}| {{Code|yes}}|- valign="top"| {{Code|backslashes}}| Specifies how quotes and other characters are rewritten to escaped:* Parsing: If the option is enabled, {{Code|\r}}, {{Code|n}} and {{Code|\t}} will be replaced with the corresponding control characters. All other escaped characters will be adopted as literals (e.g.: {{Code|\"}} → {{Code|"}}). If the option is disabled, two underscores consecutive quotes will be replaced with a single quote (unless {{Code|__quotes}}is enabled and the quote is the first or last character of a value).## A * Serialization: If the option is enabled, {{Code|\r}}, {{Code|n}}, {{Code|\t}}, {{Code|"}} and the separator character that cannot will be represented as NCName character encoded with a backslash. If the option is rewritten to an underscore and its four-digit Unicodedisabled, quotes will be duplicated.| {{Code|yes}}, {{Code|no}}| {{Code|no}}| {{Code|no}}|}
=Functions=
==csv:parse==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:parse|$input string as xs:string?|elementitem(csv)?}}<br/>{{Func|csv:parse|$input string as xs:string?, $options as itemmap(*)?|elementitem(csv)?}}
|-
| '''Summary'''
|Converts the CSV data specified by {{Code|$inputstring}} to XML, and returns the result as {{Code|element(csv)}} an XQuery value.<br/>The {{Code|$options}} argument can be used to control the way the input is converted. The following options are available:* {{Code|separator}} defines the character which separates columns in a row. By default, this is a comma ({{Code|,}}).* {{Code|header}} specifies if the input contains a header row. The default value is {{Code|false}}.Options can either be specified<br />* as children of an {{Code|<csv:options/>}} element; e.g.:<pre class="brush:xml"><csv:options> <csv:separator value=';'/> ...</csv:options></pre>* or as map, which contains all key/value pairs:<pre class="brush:xquery">{ 'separator' : ';', ... }</pre>
|-
| '''Errors'''
|{{Error|BXCS0001parse|#Errors}} the specified input cannot be parsed as CSV document.|} ==csv:doc== {| width='100%'|-| width='120' | '''Signatures'''|{{Func|csv:doc|$uri as xs:string?|item()?}}<br />{{Func|csv:doc|$uri as xs:string?, $options as map(*)?|item()?}}<br />|-| '''Summary'''|Fetches the CSV document referred to by the given {{Code|$uri}} and converts it to an XQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.|-| '''Errors'''|{{Error|parse|#Errors}} the specified input cannot be parsed as CSV document.<br/>{{Error|BXCS0003options|#Errors}} the specified separator must be a single characteroptions are conflicting.
|}
==csv:serialize==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:serialize|$input as nodeitem()?|xs:string}}<br/>{{Func|csv:serialize|$input as item()?, $options as itemmap(*)?|xs:string}}
|-
| '''Summary'''
|Serializes the node specified by {{Code|$input}} as CSV data, and returns using the result as specified {{Code|xs:string$options}}, and returns the result as string.<br />XML documents Values can also be serialized as CSV if with the standard [[Serialization|Serialization Option]] feature of XQuery:* The parameter {{Code|method}} is needs to be set to {{Code|csv}}.<br/>The {{Code|$options}} argument can be used to control the way the node is serialized. The following options are available:, and* {{Code|separator}} defines the character which separates columns options presented in a row. By default, this is a comma ({{Code|,}}).* {{Code|header}} specifies if the input element names are article need to be interpreted as header names. The default value is assigned to the {{Code|false}}.Options can either be specified<br />* as children of an {{Code|<csv:options/>}} element; e.g.:<pre class="brush:xml"><csv:options> <csv:separator value=';'/> ..parameter.</csv:options></pre>* or as map, which contains all key/value pairs:<pre class="brush:xquery">{ 'separator' : ';', ... }</pre>
|-
| '''Errors'''
|{{Error|BXCS0002serialize|#Errors}} the input cannot be serialized.<br/>{{Error|BXCS0003|#Errors}} the specified separator must be a single character.
|}
 
=Examples=
 
'''Example 1:''' Converts CSV data to XML, interpreting the first row as table header:
 
'''Input''' {{Code|addressbook.csv}}:
<syntaxhighlight lang="xml">
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
</syntaxhighlight>
 
'''Query:'''
<syntaxhighlight lang="xquery">
let $text := file:read-text('addressbook.csv')
return csv:parse($text, map { 'header': true() })
</syntaxhighlight>
 
'''Result:'''
<syntaxhighlight lang="xml">
<csv>
<record>
<Name>Huber</Name>
<First_Name>Sepp</First_Name>
<Address>Hauptstraße 13</Address>
<City>93547 Hintertupfing</City>
</record>
</csv>
</syntaxhighlight>
'''
 
'''Example 2:''' Converts some CSV data to XML and back, and checks if the input and output are equal. The expected result is {{Code|true}}:
 
'''Query:'''
<syntaxhighlight lang="xquery">
let $options := map { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
</syntaxhighlight>
 
'''Example 3:''' Converts CSV data to XQuery and returns distinct column values:
 
'''Query:'''
<syntaxhighlight lang="xquery">
let $text := ``[Name,City
Jack,Chicago
Jack,Washington
John,New York
]``
let $options := map { 'format': 'xquery', 'header': true() }
let $csv := csv:parse($text, $options)
return (
'Distinct values:',
let $records := $csv('records')
for $name at $pos in $csv('names')?*
let $values := $records?($pos)
return (
'* ' || $name || ': ' || string-join(distinct-values($values), ', ')
)
)
</syntaxhighlight>
 
'''Result:'''
<syntaxhighlight lang="xquery">
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York
</syntaxhighlight>
=Errors=
|Description
|-
|{{Code|BXCS0001parse}}| The input cannot be convertedparsed.
|-
|{{Code|BXCS0002serialize}}
| The node cannot be serialized.
|-
|{{Code|BXCS0001}}
| The specified separator must be a single character.
|}
=Changelog=
 
;Version 9.4
* Added: [[#csv:doc|csv:doc]]
 
; Version 9.1
* Updated: [[#csv:parse|csv:parse]] can be called with empty sequence.
 
;Version 9.0
 
* Added: {{Code|xquery}} option
* Removed: {{Code|map}} option
* Updated: error codes updated; errors now use the module namespace
 
;Version 8.6
 
* Updated: [[#Options|Options]]: improved Excel compatibility
 
;Version 8.0
 
* Added: {{Code|backslashes}} option
 
;Version 7.8
 
* Updated: [[#csv:parse|csv:parse]] now returns a document node instead of an element, or an XQuery map if {{Code|format}} is set to {{Code|map}}.
* Added: {{Code|format}} and {{Code|lax}} options
The module was introduced with Version 7.7.2.
 
[[Category:XQuery]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu