Changes

Jump to navigation Jump to search
199 bytes added ,  18:51, 18 November 2020
This [[Module Library|XQuery Module]] contains a single function to parse CSV input. [httphttps://en.wikipedia.org/wiki/Comma-separated_values CSV] (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.
=Conventions=
 
{{Mark|Updated with Version 9.0:}}
All functions and errors in this module are assigned to the <code><nowiki>http://basex.org/modules/csv</nowiki></code> namespace, which is statically bound to the {{Code|csv}} prefix.<br/>
If the {{Code|direct}} or {{Code|attributes}} format is chosen, a CSV string is converted to XML:
* The resulting XML document has a {{Code|<csv>}} root element.* Rows are represented via {{Code|<record>}} elements.* Fields are represented via {{Code|<entry>}} elements. The value of a field is represented as text node.* If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header, and the {{Code|<entry>}} elements are replaced with the field names:** Empty names are represented by a single underscore ({{Code|_}}), and characters that are not valid in element names are replaced with underscores or (when invalid as first character of an element name) prefixed with an underscore.** If the {{Code|laxformat}} option is set to {{Code|falsedirect}}, invalid characters will be rewritten to an underscore and the character’s four-digit Unicodefield names are encoded, as described in the [[Conversion Module#Keys|Conversion Module]], and underscores will be represented used as two underscores ({{Code|__}}). The resulting element names may be less readable, but can always be converted back to the original field names.* If * Otherwise, if {{Code|format}} is set to {{Code|attributes}}, the field names will be stored in name attributes.
'''A little advice''': in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the ''Parsing'' tab, you can see the effects of some of the conversion options.
===XQuery===
 
This format has been introduced with {{Version|9.0}}. It is more flexible and light-weight than the old, discarded {{Code|map}} format.
With the {{Code|xquery}} format, CSV records are converted to a sequence of arrays:
* <code>$csv?names?*</code> returns the names of all fields (if available)
* Return enumerated strings for all records:
<pre classsyntaxhighlight lang="brush:xquery">
for $record at $pos in $csv?records
return $pos || ". " || string-join($record?*, ', ')
</presyntaxhighlight>
The resulting representation consumes less memory than XML-based formats, and values can be directly accessed without conversion. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.
* With {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes
* With {{Code|xquery}} conversion, the input is converted to an XQuery map
| {{Code|direct}}, {{Code|attributes}}, {{Code|mapxquery}}
| {{Code|direct}}
|
| Specifies how quotes are parsed:
* Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters. Separators and newlines within the quotes will be adopted without change.
* Serialization: If the option is enabled, the value will be wrapped with quotesif it contains characters that might be treated as control characters. A quote character in the value will be encoded according to the rules of the {{Code|backslashes}} option.
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
|-
| width='120' | '''Signatures'''
|{{Func|csv:parse|$input string as xs:string?|document-nodeitem(element(csv))?}}<br/>{{Func|csv:parse|$input string as xs:string?, $options as map(*)?|item()?}}
|-
| '''Summary'''
|Converts the CSV data specified by {{Code|$inputstring}} to an XML document or a mapXQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.
|-
| '''Errors'''
|{{Error|parse|#Errors}} the specified input cannot be parsedas CSV document.|} ==csv:doc== {| width='100%'|-| width='120' | '''Signatures'''|{{Func|csv:doc|$uri as xs:string?|item()?}}<br />{{Func|csv:doc|$uri as xs:string?, $options as map(*)?|item()?}}<br />|-| '''Summary'''|Fetches the CSV document referred to by the given {{Code|$uri}} and converts it to an XQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.|-| '''Errors'''|{{Error|parse|#Errors}} the specified input cannot be parsed as CSV document.<br/>{{Error|options|#Errors}} the specified options are conflicting.
|}
'''Input''' {{Code|addressbook.csv}}:
<pre classsyntaxhighlight lang="brush:xml">
Name,First Name,Address,City
Huber,Sepp,Hauptstraße 13,93547 Hintertupfing
</presyntaxhighlight>
'''Query:'''
<pre classsyntaxhighlight lang="brush:xquery">
let $text := file:read-text('addressbook.csv')
return csv:parse($text, map { 'header': true() })
</presyntaxhighlight>
'''Result:'''
<pre classsyntaxhighlight lang="brush:xml">
<csv>
<record>
</record>
</csv>
</presyntaxhighlight>
'''
'''Query:'''
<pre classsyntaxhighlight lang="brush:xquery">
let $options := map { 'lax': false() }
let $input := file:read-text('some-data.csv')
let $output := $input => csv:parse($options) => csv:serialize($options)
return $input eq $output
</presyntaxhighlight>
'''Example 3:''' Converts CSV data to XQuery and returns distinct column values:
'''Query:'''
<pre classsyntaxhighlight lang="brush:xquery">
let $text := ``[Name,City
Jack,Chicago
let $records := $csv('records')
for $name at $pos in $csv('names')?*
let $values := $records?($pos)
return (
'* ' || $name || ': ' || string-join(distinct-values($values), ', ')
)
)
</presyntaxhighlight>
'''Result:'''
<pre classsyntaxhighlight lang="brush:xquery">
Distinct values:
* Name: Jack, John
* City: Chicago, Washington, New York
</presyntaxhighlight>
=Errors=
 
{{Mark|Updated with Version 9.0:}}
{| class="wikitable" width="100%"
=Changelog=
 
;Version 9.4
* Added: [[#csv:doc|csv:doc]]
 
; Version 9.1
* Updated: [[#csv:parse|csv:parse]] can be called with empty sequence.
;Version 9.0
Bureaucrats, editor, reviewer, Administrators
12,227

edits

Navigation menu