Changes

Jump to navigation Jump to search
779 bytes added ,  18:51, 18 November 2020
This [[Module Library|XQuery Module]] contains a single function to parse CSV input. [httphttps://en.wikipedia.org/wiki/Comma-separated_values CSV] (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.
=Conventions=
All functions and errors in this module are assigned to the <code><nowiki>http://basex.org/modules/csv</nowiki></code> namespace, which is statically bound to the {{Code|csv}} prefix.<br/>All errors are assigned to the <code><nowiki>http://basex.org/errors</nowiki></code> namespace, which is statically bound to the {{Code|bxerr}} prefix.
==Conversion==
===XML: Direct, Attributes===
If the {{Code|direct}} or {{Code|attributes}} format is chosen, a CSV string is converted to XML as follows:
* The resulting XML document has a {{Code|<csv>}} root element.* Rows are represented via {{Code|<record>}} elements.* Fields are represented via {{Code|<entry>}} elements. The value of a field is represented as text node.* If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header, and the {{Code|<entry>}} elements are replaced with the field names:** Empty names are represented by a single underscore ({{Code|_}}), and characters that are not valid in element names are replaced with underscores or (when invalid as first character of an element name) prefixed with an underscore.** If the {{Code|laxformat}} option is set to {{Code|falsedirect}}, invalid characters will be rewritten to an underscore and the character’s four-digit Unicodefield names are encoded, as described in the [[Conversion Module#Keys|Conversion Module]], and underscores will be represented used as two underscores ({{Code|__}}). The resulting element names may be less readable, but can always be converted back to the original field names.* If * Otherwise, if {{Code|format}} is set to {{Code|attributes}}, the field names will be stored in name attributes.
'''A little advice''': in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the ''Parsing'' tab, you can see the effects of some of the conversion options.
===XQuery===
With the {{MarkCode|Introduced with Version 9.0xquery}}: format, CSV records can be are converted to a sequence of arrays:
* The resulting value will be a map with a {{Code|records}} and an optional {{Code|names}} key.
* The column names will be available if {{Code|header}} option is set to {{Code|true}}.
The CSV map can e.g. be accessed as follows: * <code>$csv?records[5]</code> returns all entries of the 5th record (row)* <code>$csv?records(2)</code> returns all entries of the 2nd field (column)* <code>$csv?names?*</code> returns the names of all fields (if available)* Return enumerated strings for all records:<syntaxhighlight lang="xquery">for $record at $pos in $csv?recordsreturn $pos || ". " || string-join($record?*, ', ')</syntaxhighlight> The resulting representation will consume consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.
==Options==
 
{{Mark|Updated with Version 8.6}}: improved Excel compatibility
In the following table, all available options are listed. The Excel column indicates what are the preferred options for data that is to be imported, or has been exported from Excel.
* With {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes
* With {{Code|xquery}} conversion, the input is converted to an XQuery map
| {{Code|direct}}, {{Code|attributes}}, {{Code|mapxquery}}
| {{Code|direct}}
|
| Specifies how quotes are parsed:
* Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters. Separators and newlines within the quotes will be adopted without change.
* Serialization: If the option is enabled, the value will be wrapped with quotesif it contains characters that might be treated as control characters. A quote character in the value will be encoded according to the rules of the {{Code|backslashes}} option.
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
|-
| width='120' | '''Signatures'''
|{{Func|csv:parse|$input string as xs:string?|document-nodeitem(element(csv))?}}<br/>{{Func|csv:parse|$input string as xs:string?, $options as map(*)?|item()?}}|-| '''Summary'''|Converts the CSV {{Code|$string}} to an XQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.|-| '''Errors'''|{{Error|parse|#Errors}} the specified input cannot be parsed as CSV document.|} ==csv:doc== {| width='100%'|-| width='120' | '''Signatures'''|{{Func|csv:doc|$uri as xs:string, ?|item()?}}<br />{{Func|csv:doc|$uri as xs:string?, $options as map(*)?|item()?}}<br />
|-
| '''Summary'''
|Converts Fetches the CSV data specified document referred to by the given {{Code|$inputuri}} and converts it to an XML document or a mapXQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.
|-
| '''Errors'''
|{{Error|BXCS0001parse|#Errors}} the specified input cannot be parsedas CSV document.<br/>{{Error|options|#Errors}} the specified options are conflicting.
|}
|-
| width='120' | '''Signatures'''
|{{Func|csv:serialize|$input as nodeitem()?|xs:string}}<br/>{{Func|csv:serialize|$input as nodeitem()?, $options as map(xs:string, item()*)?|xs:string}}
|-
| '''Summary'''
|Serializes the node specified by {{Code|$input}} as CSV data, and returns using the result as specified {{Code|xs:string$options}}, and returns the result as string. Items Values can also be serialized as JSON if CSV with the standard [[Serialization|Serialization Parameter]] feature of XQuery:* The parameter {{Code|method}} is needs to be set to {{Code|csv}}.<br/>The , and* the options presented in this article need to be assigned to the {{Code|$optionscsv}} argument can be used to control the way the input is serializedparameter.
|-
| '''Errors'''
|{{Error|BXCS0002serialize|#Errors}} the input cannot be serialized.
|}
'''Input''' {{Code|addressbook.csv}}:
<pre classsyntaxhighlight lang="brush:xml">
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
</presyntaxhighlight>
'''Query:'''
<pre classsyntaxhighlight lang="brush:xquery">
let $text := file:read-text('addressbook.csv')
return csv:parse($text, map { 'header': true() })
</presyntaxhighlight>
'''Result:'''
<pre classsyntaxhighlight lang="brush:xml">
<csv>
<record>
</record>
</csv>
</presyntaxhighlight>
'''
'''Query:'''
<pre classsyntaxhighlight lang="brush:xquery">
let $options := map { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
</presyntaxhighlight>
'''Example 3:''' Converts CSV data to XQuery and returns distinct column values:
'''Query:'''
<pre classsyntaxhighlight lang="brush:xquery">
let $text := ``[Name,City
Jack,Chicago
]``
let $options := map { 'format': 'xquery', 'header': true() }
let $data csv := csv:parse($text, $options)
return (
'Distinct values:',
let $records := $data?csv('records') for $name at $pos in $data?csv('names')?*
let $values := $records?($pos)
return (
)
)
</presyntaxhighlight>
'''Result:'''
<pre classsyntaxhighlight lang="brush:xquery">
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York
</presyntaxhighlight>
=Errors=
|Description
|-
|{{Code|BXCS0001parse}}
| The input cannot be parsed.
|-
|{{Code|BXCS0002serialize}}
| The node cannot be serialized.
|}
=Changelog=
 
;Version 9.4
* Added: [[#csv:doc|csv:doc]]
 
; Version 9.1
* Updated: [[#csv:parse|csv:parse]] can be called with empty sequence.
;Version 9.0
* Added: {{Code|xquery}} option
* Removed: {{Code|map}} option
* Updated: error codes updated; errors now use the module namespace
;Version 8.6
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu