Changes

Jump to navigation Jump to search
385 bytes added ,  14:17, 20 July 2022
no edit summary
This [[Module Library|XQuery Module]] contains a single function to parse CSV input. [httphttps://en.wikipedia.org/wiki/Comma-separated_values CSV] (comma-separated values) is a popular representation for tabular data, exported e. g. from Excel.
=Conventions=
If the {{Code|direct}} or {{Code|attributes}} format is chosen, a CSV string is converted to XML:
* The resulting XML document has a {{Code|<csv>}} root element.* Rows are represented via {{Code|<record>}} elements.* Fields are represented via {{Code|<entry>}} elements. The value of a field is represented as text node.* If the {{Code|header}} option is set to {{Code|true}}, the first text line is parsed as table header, and the {{Code|<entry>}} elements are replaced with the field names:** Empty names are represented by a single underscore ({{Code|_}}), and characters that are not valid in element names are replaced with underscores or (when invalid as first character of an element name) prefixed with an underscore.** If the {{Code|laxformat}} option is set to {{Code|falsedirect}}, invalid characters will be rewritten to an underscore and the character’s four-digit Unicodefield names are encoded, as described in the [[Conversion Module#Keys|Conversion Module]], and underscores will be represented used as two underscores ({{Code|__}}). The resulting element names may be less readable, but can always be converted back to the original field names.* If * Otherwise, if {{Code|format}} is set to {{Code|attributes}}, the field names will be stored in name attributes.
'''A little advice''': in the Database Creation dialog of the GUI, if you select CSV Parsing and switch to the ''Parsing'' tab, you can see the effects of some of the conversion options.
* <code>$csv?records[5]</code> returns all entries of the 5th record (row)
* <code>$csv?records?(2)</code> returns all entries of the 2nd field (column)
* <code>$csv?names?*</code> returns the names of all fields (if available)
* Return enumerated strings for all records:
==Options==
In the following table, all available options are listed. The Excel column indicates what are the preferred lists recommended options for data that is to be imported, processed with Excel or has been exported from ExcelOpen/Libre Office.
{| class="wikitable sortable" width="100%"
|- valign="top"
! width="140" | Option! width="50%" | Description
! Allowed
! Default
! Direction
! Excel
|- valign="top"
| {{Code|comma}}, {{Code|semicolon}}, {{Code|colon}}, {{Code|tab}}, {{Code|space}} or a ''single character''
| {{Code|comma}}
| ''parse'', ''serialize''| {{Code|semicolon}}or {{Code|comma}}, depending on the region
|- valign="top"
| {{Code|header}}
| {{Code|yes}}, {{Code|no}}
| {{Code|no}}
| ''parse'', ''serialize''
|
|- valign="top"
| {{Code|direct}}, {{Code|attributes}}, {{Code|xquery}}
| {{Code|direct}}
| ''parse'', ''serialize''
|
|- valign="top"
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
| ''parse'', ''serialize''
| {{Code|no}}
|- valign="top"
| {{Code|yes}}, {{Code|no}}
| {{Code|yes}}
| ''parse'', ''serialize''
| {{Code|yes}}
|- valign="top"
| {{Code|yes}}, {{Code|no}}
| {{Code|no}}
| ''parse'', ''serialize''
| {{Code|no}}
|- valign="top"
| {{Code|allow}}
| In Excel, a value will be evaluated if it starts with the character {{Code|-}}, {{Code|+}}, <code>=</code>, {{Code|@}}, {{Code|\t}} or {{Code|\r}}. A regular expression can be specified to reject data that will be handled differently than expected by an application, or that may be malicious (see https://owasp.org/www-community/attacks/CSV_Injection for more details).
| ''string''
|
| ''serialize''
| <code><nowiki>[^-+=\t\r].*|[-+]\d*([,.]\d+)?</nowiki></code>
|}
=Functions=
==csv:parsedoc==
{| width='100%'
|-valign="top"
| width='120' | '''Signatures'''
|{{Func|csv:parsedoc|$string uri as xs:string?|item()?}}<br/>{{Func|csv:parsedoc|$string uri as xs:string?, $options as map(*)?|item()?}}<br />|-valign="top"
| '''Summary'''
|Converts Fetches the CSV document referred to by the given {{Code|$stringuri}} and converts it to an XQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.|-valign="top"
| '''Errors'''
|{{Error|parse|#Errors}} the specified input cannot be parsed as CSV document.<br/>{{Error|options|#Errors}} the specified options are conflicting.
|}
==csv:docparse== {{Mark|Introduced with BaseX 9.4:}}
{| width='100%'
|-valign="top"
| width='120' | '''Signatures'''
|{{Func|csv:docparse|$uri string as xs:string?|item()?}}<br />{{Func|csv:docparse|$uri string as xs:string?, $options as map(*)?|item()?}}<br />|-valign="top"
| '''Summary'''
|Fetches Converts the CSV document referred to by the given {{Code|$uristring}} and converts it to an XQuery value. The {{Code|$options}} argument can be used to control the way the input is converted.|-valign="top"
| '''Errors'''
|{{Error|parse|#Errors}} the specified input cannot be parsed as CSV document.<br/>{{Error|options|#Errors}} the specified options are conflicting.
|}
{| width='100%'
|-valign="top"
| width='120' | '''Signatures'''
|{{Func|csv:serialize|$input as item()?|xs:string}}<br/>{{Func|csv:serialize|$input as item()?, $options as map(*)?|xs:string}}
|-valign="top"
| '''Summary'''
|Serializes the specified {{Code|$input}} as CSV, using the specified {{Code|$options}}, and returns the result as string.
* The parameter {{Code|method}} needs to be set to {{Code|csv}}, and
* the options presented in this article need to be assigned to the {{Code|csv}} parameter.
|-valign="top"
| '''Errors'''
|{{Error|serialize|#Errors}} the input cannot be serialized.
! width="110"|Code
|Description
|-valign="top"
|{{Code|parse}}
| The input cannot be parsed.
|-valign="top"
|{{Code|serialize}}
| The node cannot be serialized.
=Changelog=
 
;Version 9.7
* Added: [[#Options|Options]]: <code>allow</code> option.
;Version 9.4
* Added: [[#csv:doc{{Function||csv:doc]]}}
; Version 9.1
* Updated: [[#csv:parse{{Function||csv:parse]] }} can be called with empty sequence.
;Version 9.0
;Version 7.8
* Updated: [[#csv:parse{{Function||csv:parse]] }} now returns a document node instead of an element, or an XQuery map if {{Code|format}} is set to {{Code|map}}.
* Added: {{Code|format}} and {{Code|lax}} options
The module was introduced with Version 7.7.2.
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu