Changes

Jump to navigation Jump to search
1,790 bytes added ,  10:06, 4 December 2019
no edit summary
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/csv}} </nowiki></code> namespace, which is statically bound to the {{Code|csv}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
==Conversion==
===XML: Direct, Attributes===
If the {{Code|direct}} or {{Code|attributes}} format is chosen, a CSV string is converted to XML as follows:
* The resulting XML document has a {{Code|<csv>}} root element.
* If {{Code|format}} is set to {{Code|attributes}}, field names will be stored in name attributes.
'''A little advice''': in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the ''Parsing'' tab, you can see the effects of some of the conversion options. ===MapXQuery=== With the {{Code|xquery}} format, CSV records are converted to a sequence of arrays: * The resulting value will be a map with a {{Code|records}} and an optional {{Code|names}} key.* Records are organized as a sequence of arrays. A single array contains the entries of a single record.* The column names will be available if {{Code|header}} option is set to {{Code|true}}.
If {{Code|format}} is set to {{Code|The CSV map}}, the CSV data will can e.g. be converted to an XQuery mapaccessed as follows:
* All <code>$csv?records are enumerated with positive integers.[5]</code> returns all entries of the 5th record (row)* By default, <code>$csv?records(2)</code> returns all entries of a the 2nd field (column)* <code>$csv?names?*</code> returns the names of all fields (if available)* Return enumerated strings for all records are represented :<pre class="brush:xquery">for $record at $pos in a sequence$csv?recordsreturn $pos || ".* If the {{Code" |header}} option is set to {{Code|true}}string-join($record?*, a map is created', which contains all field names and its values.')</pre>
'''A little advice''': in the Database Creation dialog of the GUIThe resulting representation consumes less memory than XML-based formats, if you select CSV Parsing and switch to the ''Parsing'' tab, you values can see the effects of some of the be directly accessed without conversion options. Thus, it is recommendable for very large inputs and for efficient ad-hoc processing.
==Options==
The In the following table, all available options are available:listed. The Excel column indicates what are the preferred options for data that is to be imported, or has been exported from Excel.
{| class="wikitable sortable" width="100%"
! Allowed
! Default
! Excel
|- valign="top"
| {{Code|separator}}
| Defines the character which separates the entries values of a single record in a single line.
| {{Code|comma}}, {{Code|semicolon}}, {{Code|colon}}, {{Code|tab}}, {{Code|space}} or a ''single character''
| {{Code|comma}}
| {{Code|semicolon}}
|- valign="top"
| {{Code|header}}
| {{Code|yes}}, {{Code|no}}
| {{Code|no}}
|
|- valign="top"
| {{Code|format}}
| Specifies the format of the XML data. The format is only relevant if the {{Code|header}} option is activated:<br/>
* With {{Code|direct}} conversion, field names are represented as element names
* With {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes
* With {{Code|mapxquery}} conversion, the input is converted to an XQuery map| {{Code|direct}}, {{Code|attributes}}, {{Code|mapxquery}}
| {{Code|direct}}
|
|- valign="top"
| {{Code|lax}}
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
| {{Code|no}}
|- valign="top"
| {{Code|quotes}}
| Specifies if how quotes should be are parsed in :* Parsing: If the option is enabled, quotes at the input start and end of a value will be treated as control characters. Separators and generated newlines within the quotes will be adopted without change.* Serialization: If the option is enabled, the value will be wrapped with quotes if it contains characters that might be treated as control characters. A quote character in the outputvalue will be encoded according to the rules of the {{Code|backslashes}} option.
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
| {{Code|yes}}
|- valign="top"
| {{Code|backslashes}}
| Specifies if how quotes and other characters are escaped by backslashes:* Parsing: If the option is enabled, {{Code|\r}}, {{Code|n}} and {{Code|\t}} will be replaced with the corresponding control characters. OtherwiseAll other escaped characters will be adopted as literals (e.g.: {{Code|\"}} → {{Code|"}}). If the option is disabled, two consecutive quotes will be replaced with a double single quote (unless {{Code|quotes}} is enabled and the quote is the first or last character of a value).* Serialization: If the option is enabled, {{Code|\r}}, {{Code|n}}, {{Code|\t}}, {{Code|"}} and the separator character will be encoded by with a second consecutive quotebackslash. If the option is disabled, quotes will be duplicated.
| {{Code|yes}}, {{Code|no}}
| {{Code|no}}
| {{Code|no}}
|}
 
The CSV function signatures provide an {{Code|$options}} argument. Options can either be specified
* as children of an {{Code|<csv:options/>}} element; e.g.:
<pre class="brush:xml">
<csv:options>
<csv:separator value=';'/>
...
</csv:options>
</pre>
* or as map, which contains all key/value pairs:
<pre class="brush:xquery">
map { 'separator': ';', ... }
</pre>
=Functions=
|-
| width='120' | '''Signatures'''
|{{Func|csv:parse|$input string as xs:string?|document-node(elementitem(csv))?}}<br/>{{Func|csv:parse|$input string as xs:string?, $options as itemmap(*)?|item()?}}
|-
| '''Summary'''
|Converts the CSV data specified by {{Code|$inputstring}} to an XML document or a mapXQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.
|-
| '''Errors'''
|{{Error|BXCS0001parse|#Errors}} the input cannot be parsed.
|}
==csv:serialize==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:serialize|$input as nodeitem()?|xs:string}}<br/>{{Func|csv:serialize|$input as nodeitem()?, $options as itemmap(*)?|xs:string}}
|-
| '''Summary'''
|Serializes the node specified by {{Code|$input}} as CSV data, and returns using the result as specified {{Code|xs:string$options}}, and returns the result as string. Items Values can also be serialized as JSON if CSV with the standard [[Serialization|Serialization Parameter]] feature of XQuery:* The parameter {{Code|method}} is needs to be set to {{Code|csv}}.<br/>The , and* the options presented in this article need to be assigned to the {{Code|$optionscsv}} argument can be used to control the way the input is serializedparameter.
|-
| '''Errors'''
|{{Error|BXCS0002serialize|#Errors}} the input cannot be serialized.
|}
'''Query:'''
<pre class="brush:xquery">
let $text options := map { 'lax': false() }let $input := file:read-text('some-data.csv')let $options output := map { 'lax': 'no' }let $xml :input = > csv:parse($text, $options)let $csv := > csv:serialize($xml, $options)return $text input eq $csvoutput
</pre>
'''Example 3:''' Converts CSV data to an XQuery map item and serializes its contentsreturns distinct column values:
'''Query:'''
<pre class="brush:xquery">
let $text := "``[Name;,City" || outJack,ChicagoJack,WashingtonJohn,New York]``let $options := map { 'format':nl() || "John;Newton" || out'xquery', 'header':nltrue() || "Jack;Oldtown"}let $options csv := <csv:parse($text, $options>)return ( <csv 'Distinct values:separator value=';'/>, <csv let $records :format value=$csv('maprecords'/>) < for $name at $pos in $csv:header value=('yesnames'/>)?* </csvlet $values :options>= $records?($pos) return csv( '* ' || $name || ':parse' || string-join(distinct-values($textvalues), ', $options') ))
</pre>
'''Result:'''
<pre class="brush:xmlxquery">{ 1Distinct values: { "City"* Name: "Newton"Jack, "Name": "John" }, 2: { "* City": "Oldtown"Chicago, Washington, "Name": "Jack" }}New York
</pre>
|Description
|-
|{{Code|BXCS0001parse}}
| The input cannot be parsed.
|-
|{{Code|BXCS0002serialize}}
| The node cannot be serialized.
|}
=Changelog=
 
; Version 9.1
* Updated: [[#csv:parse|csv:parse]] can be called with empty sequence.
 
;Version 9.0
 
* Added: {{Code|xquery}} option
* Removed: {{Code|map}} option
* Updated: error codes updated; errors now use the module namespace
 
;Version 8.6
 
* Updated: [[#Options|Options]]: improved Excel compatibility
;Version 8.0
The module was introduced with Version 7.7.2.
 
[[Category:XQuery]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu