Changes

Jump to navigation Jump to search
2,236 bytes added ,  16:18, 11 February 2017
=Conventions=
All functions in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/csv}} </nowiki></code> namespace, which is statically bound to the {{Code|csv}} prefix.<br/>All errors are assigned to the {{Code|<code><nowiki>http://basex.org/errors}} </nowiki></code> namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Rules=Conversion== ===XML: Direct, Attributes===
CSV is converted to XML as follows:
# * The resulting XML document has a {{Code|<csv>}} root element.# * Rows are represented via {{Code|<record>}} elements.# * Fields are represented via {{Code|<entry>}} elements. The value of a field is represented as text node.# * If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header, and the {{Code|<entry>}} elements are replaced with the field names:## ** Empty names are represented by a single underscore ({{Code|_}}), and characters that are not valid in element names are replaced with underscoresor (when invalid as first character of an element name) prefixed with an underscore.## ** If the {{Code|lax}} option is set to {{Code|false}}, invalid characters will be rewritten to an underscore and the character’s four-digit Unicode, and underscores will be represented as two underscores ({{Code|__}}). The resulting element names may be less readable, but can always be converted back to the original field names.# * If {{Code|format}} is set to {{Code|attributes}}, field names will be stored in name attributes.
In ===Map=== If {{Code|format}} is set to {{Code|map}}, the CSV data will be converted to an XQuery map: * All records are enumerated with positive integers.* By default, all entries of a records are represented in a sequence.* If the {{Code|header}} option is set to {{Code|true}}, a map is created, which contains all field names and its values. '''A little advice''': in the Database Creation dialog of the GUI, when the if you select CSV parser is selected, Parsing and switch to the ''Parsing'' tab demonstrates , you can see the conversion effects of CSV to XML and the effects some of the single conversion options.
==Options==
The {{Mark|Updated with Version 8.6}}: improved Excel compatibility In the following table, all available options are available:listed. The Excel column indicates what are the preferred options for data that is to be imported, or has been exported from Excel.
{| class="wikitable sortable" width="100%"
! Allowed
! Default
! Excel
|- valign="top"
| {{Code|separator}}
| Defines the character which separates the entries values of a single record in a single line.
| {{Code|comma}}, {{Code|semicolon}}, {{Code|colon}}, {{Code|tab}}, {{Code|space}} or a ''single character''
| {{Code|comma}}
| {{Code|semicolon}}
|- valign="top"
| {{Code|header}}
| {{Code|yes}}, {{Code|no}}
| {{Code|no}}
|
|- valign="top"
| {{Code|format}}
* With {{Code|direct}} conversion, field names are represented as element names
* With {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes
* With {{Code|map}} conversion, the input is converted to an XQuery map| {{Code|direct}}, {{Code|attributes}}, {{Code|map}}
| {{Code|direct}}
|
|- valign="top"
| {{Code|lax}}
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
| {{Code|no}}
|- valign="top"
| {{Code|quotes}}
| Specifies how quotes are parsed:
* Parsing: If the option is enabled, quotes at the start and end of a value will be treated as control characters. Separators and newlines within the quotes will be adopted without change.
* Serialization: If the option is enabled, the value will be wrapped with quotes. A quote character in the value will be encoded according to the rules of the {{Code|backslashes}} option.
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
| {{Code|yes}}
|- valign="top"
| {{Code|backslashes}}
| Specifies how quotes and other characters are escaped:
* Parsing: If the option is enabled, {{Code|\r}}, {{Code|n}} and {{Code|\t}} will be replaced with the corresponding control characters. All other escaped characters will be adopted as literals (e.g.: {{Code|\"}} → {{Code|"}}). If the option is disabled, two consecutive quotes will be replaced with a single quote (unless {{Code|quotes}} is enabled and the quote is the first or last character of a value).
* Serialization: If the option is enabled, {{Code|\r}}, {{Code|n}}, {{Code|\t}}, {{Code|"}} and the separator character will be encoded with a backslash. If the option is disabled, quotes will be duplicated.
| {{Code|yes}}, {{Code|no}}
| {{Code|no}}
| {{Code|no}}
|}
 
The CSV function signatures provide an {{Code|$options}} argument. Options can either be specified
* as children of an {{Code|<csv:options/>}} element; e.g.:
<pre class="brush:xml">
<csv:options>
<csv:separator value=';'/>
...
</csv:options>
</pre>
* or as map, which contains all key/value pairs:
<pre class="brush:xquery">
{ 'separator': ';', ... }
</pre>
=Functions=
==csv:parse==
 
{{Version|7.8}}: the return type has been changed from {{Code|element(<csv>)}} to {{Code|document-node(element(<csv>))}}, and the {{Code|format}} and {{Code|lax}} options have been added.
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:parse|$input as xs:string|document-node(element(csv))}}<br/>{{Func|csv:parse|$input as xs:string, $options as map(xs:string, item())|document-nodeitem(element(csv))}}
|-
| '''Summary'''
|Converts the CSV data specified by {{Code|$input}} to an XML, and returns the result as {{Code|<csv/>}} valuedocument or a map. The {{Code|$options}} argument can be used to control the way the input is converted.
|-
| '''Errors'''
==csv:serialize==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|csv:serialize|$input as node()|xs:string}}<br/>{{Func|csv:serialize|$input as node(), $options as map(xs:string, item())|xs:string}}
|-
| '''Summary'''
|Serializes the node specified by {{Code|$input}} as CSV data, and returns the result as {{Code|xs:string}}.<br />XML documents Items can also be serialized as CSV JSON if the [[Serialization|SerializationParameter]] parameter {{Code|method}} is set to {{Code|csv}}.<br/>With the The {{Code|$options}} argument, can be used to control the way the node input is serialized can be controlled.
|-
| '''Errors'''
<pre class="brush:xquery">
let $text := file:read-text('addressbook.csv')
return csv:parse($text, map { 'header': 'true' () })
</pre>
'''Query:'''
<pre class="brush:xquery">
let $text options := map { 'lax': false() }let $input := file:read-text('some-data.csv')let $output := $input => csv:parse($options) => csv:serialize($options )return $input eq $output</pre> '''Example 3:''' Converts CSV data to an XQuery map item and serializes its contents:= {  ''lax'Query: 'no' }'<pre class="brush:xquery">let $xml text := csv"Name;City" || out:nl() || "John;Newton" || out:parsenl($text, $options)|| "Jack;Oldtown"let $csv options := map { 'separator': ';', 'format' : 'map', 'header' : true()}return csv:serializeparse($xmltext, $options)return $text eq $csv</pre> '''Result:'''<pre class="brush:xquery">map { 1: map { "City": "Newton", "Name": "John" }, 2: map { "City": "Oldtown", "Name": "Jack" }}
</pre>
=Changelog=
 
;Version 8.6
 
* Updated: [[#Options|Options]]: improved Excel compatibility
 
;Version 8.0
 
* Added: {{Code|backslashes}} option
;Version 7.8
* Updated: return type of [[#csv:parse|csv:parse]] changed from now returns a document node instead of an element, or an XQuery map if {{Code|element(<csv>)format}} is set to {{Code|document-node(element(<csv>))map}}.
* Added: {{Code|format}} and {{Code|lax}} options
The module was introduced with Version 7.7.2.
 
[[Category:XQuery]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu