Difference between revisions of "CSV Module"
Line 20: | Line 20: | ||
# If {{Code|format}} is set to {{Code|attributes}}, field names will be stored in name attributes. | # If {{Code|format}} is set to {{Code|attributes}}, field names will be stored in name attributes. | ||
− | + | In the Database Creation dialog of the GUI, when the CSV parser is selected, the ''Parsing'' tab demonstrates the conversion of CSV to XML and the effects of the single conversion options. | |
+ | |||
+ | ==Options== | ||
+ | |||
+ | The following options are available: | ||
+ | |||
+ | {| class="wikitable sortable" width="100%" | ||
+ | |- valign="top" | ||
+ | ! width="140" | Parameter | ||
+ | ! width="50%" | Description | ||
+ | ! Allowed | ||
+ | ! Default | ||
+ | |- valign="top" | ||
+ | | {{Code|separator}} | ||
+ | | Defines the character which separates the entries of a record in a single line. | ||
+ | | {{Code|comma}}, {{Code|semicolon}}, {{Code|colon}}, {{Code|tab}}, {{Code|space}} or a ''single character'' | ||
+ | | {{Code|comma}} | ||
+ | |- valign="top" | ||
+ | | {{Code|header}} | ||
+ | | Indicates if the first line of the parsed or serialized CSV data is a table header. | ||
+ | | {{Code|yes}}, {{Code|no}} | ||
+ | | {{Code|no}} | ||
+ | |- valign="top" | ||
+ | | {{Code|format}} | ||
+ | | Specifies the format of the XML data. The format is only relevant if the {{Code|header}} option is activated:<br/> | ||
+ | * In the {{Code|direct}} conversion format, field names are represented as element names | ||
+ | * In the {{Code|attributes}} conversion, field names are stored in {{Code|name}} attributes | ||
+ | | {{Code|direct}}, {{Code|attributes}} | ||
+ | | {{Code|direct}} | ||
+ | |- valign="top" | ||
+ | | {{Code|lax}} | ||
+ | | Specifies if a lax approach is used to convert QNames to JSON names. | ||
+ | | {{Code|yes}}, {{Code|no}} | ||
+ | | {{Code|yes}} | ||
+ | |} | ||
=Functions= | =Functions= | ||
Line 34: | Line 68: | ||
|- | |- | ||
| '''Summary''' | | '''Summary''' | ||
− | |Converts the CSV data specified by {{Code|$input}} to XML, and returns the result as {{Code|<csv/>}} value.<br/>The {{Code|$options}} argument can be used to control the way the input is converted. | + | |Converts the CSV data specified by {{Code|$input}} to XML, and returns the result as {{Code|<csv/>}} value.<br/>The {{Code|$options}} argument can be used to control the way the input is converted. Options can either be specified<br /> |
− | |||
− | |||
− | Options can either be specified<br /> | ||
* as children of an {{Code|<csv:options/>}} element; e.g.: | * as children of an {{Code|<csv:options/>}} element; e.g.: | ||
<pre class="brush:xml"> | <pre class="brush:xml"> | ||
Line 61: | Line 92: | ||
|- | |- | ||
| '''Summary''' | | '''Summary''' | ||
− | |Serializes the node specified by {{Code|$input}} as CSV data, and returns the result as {{Code|xs:string}}.<br />XML documents can also be serialized as CSV if the [[Serialization|Serialization Option]] {{Code|method}} is set to {{Code|csv}}.<br/> | + | |Serializes the node specified by {{Code|$input}} as CSV data, and returns the result as {{Code|xs:string}}.<br />XML documents can also be serialized as CSV if the [[Serialization|Serialization Option]] {{Code|method}} is set to {{Code|csv}}.<br/>With the {{Code|$options}} argument, the way the node is serialized can be controlled. Options can either be specified<br /> |
− | |||
− | |||
− | Options can either be specified<br /> | ||
* as children of an {{Code|<csv:options/>}} element; e.g.: | * as children of an {{Code|<csv:options/>}} element; e.g.: | ||
<pre class="brush:xml"> | <pre class="brush:xml"> |
Revision as of 00:10, 18 October 2013
This XQuery Module contains a single function to parse CSV input. CSV (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.
Contents
Conventions
All functions in this module are assigned to the http://basex.org/modules/csv
namespace, which is statically bound to the csv
prefix.
All errors are assigned to the http://basex.org/errors
namespace, which is statically bound to the bxerr
prefix.
Rules
Version 7.7.2: the conversion rules have been updated and aligned with the JSON parser.
CSV is converted to XML as follows:
- The resulting XML document has a
<csv/>
root elements. - Rows are represented via
<record/>
elements. - Fields are represented via
<entry/>
elements. The value of a field is represented as text node. - If the
header
option is set totrue
, the first text line is parsed as table header, and theentry
elements are replaced with the field names:- Empty names are represented by a single underscore (
_
), and characters that are not valid in element names are replaced with underscores. - If the
lax
option is set tofalse
, invalid characters will be rewritten to an underscore and the character’s four-digit Unicode, and underscores will be represented as two underscores (__
). The resulting element names may be less readable, but can always be converted back to the original field names.
- Empty names are represented by a single underscore (
- If
format
is set toattributes
, field names will be stored in name attributes.
In the Database Creation dialog of the GUI, when the CSV parser is selected, the Parsing tab demonstrates the conversion of CSV to XML and the effects of the single conversion options.
Options
The following options are available:
Parameter | Description | Allowed | Default |
---|---|---|---|
separator
|
Defines the character which separates the entries of a record in a single line. | comma , semicolon , colon , tab , space or a single character
|
comma
|
header
|
Indicates if the first line of the parsed or serialized CSV data is a table header. | yes , no
|
no
|
format
|
Specifies the format of the XML data. The format is only relevant if the header option is activated:
|
direct , attributes
|
direct
|
lax
|
Specifies if a lax approach is used to convert QNames to JSON names. | yes , no
|
yes
|
Functions
csv:parse
Version 7.8: the return type has been changed from element(<csv>)
to document-node(element(<csv>))
, and the format
and lax
options have been added.
Signatures | csv:parse($input as xs:string) as document-node(element(csv)) csv:parse($input as xs:string, $options as item()) as document-node(element(csv))
|
Summary | Converts the CSV data specified by $input to XML, and returns the result as <csv/> value.The $options argument can be used to control the way the input is converted. Options can either be specified
<csv:options> <csv:separator value=';'/> ... </csv:options>
{ 'separator': ';', ... } |
Errors | BXCS0001 : the input cannot be converted.BXCS0003 : the specified separator must be a single character.
|
csv:serialize
Signatures | csv:serialize($input as node(), $options as item()) as xs:string
|
Summary | Serializes the node specified by $input as CSV data, and returns the result as xs:string .XML documents can also be serialized as CSV if the Serialization Option method is set to csv .With the $options argument, the way the node is serialized can be controlled. Options can either be specified
<csv:options> <csv:separator value=';'/> ... </csv:options>
{ 'separator' : ';', ... } |
Errors | BXCS0002 : the input cannot be serialized.BXCS0003 : the specified separator must be a single character.
|
Errors
Code | Description |
---|---|
BXCS0001
|
The input cannot be converted. |
BXCS0002
|
The node cannot be serialized. |
BXCS0001
|
The specified separator must be a single character. |
Changelog
- Version 7.8
- Updated: return type of csv:parse changed from
element(<csv>)
todocument-node(element(<csv>))
- Added:
format
andlax
parameters
The module was introduced with Version 7.7.2.