Changes

Jump to navigation Jump to search
50 bytes removed ,  19:24, 27 March 2015
no edit summary
Please visit the [[Serialization]] article if you want to know how to export data.
==XML Parsers==
BaseX provides two parsers to import XML data:
* Java’s [http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParser.html SAXParser] can also be selected for parsing XML documents. This parser is stricter than the built-in parser, but it refuses to process some large documents.
===GUI===
Go to Menu ''Database'' → ''New'', then choose the ''Parsing'' tab and (de)activate ''Use internal XML parser''. The parsing of DTDs can be turned on/off by selecting the checkbox below.
===Command Line===
To turn the internal XML parser and DTD parsing on/off, modify the <code>INTPARSE</code> and <code>DTD</code> options:
SET [[Options#DTD|DTD]] true
===XQuery===
The [[Database Module#db:add|db:add]] and [[Database Module#db:replace|db:replace]] functions can also be used to add new XML documents to the database. The following example query uses the internal XML parser and adds all files to the database <code>DB</code> that are found in the directory <code>2Bimported</code>:
</pre>
==HTML Parser==
With [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup], HTML can be imported in BaseX without any problems. TagSoup ensures that only well-formed HTML arrives at the XML parser (correct opening and closing tags, etc.). Hence, if TagSoup is not available on a system, there will be a lot of cases where importing HTML fails, no matter whether you use the GUI or the standalone mode.
===Installation===
=====Downloads=====
TagSoup is already included in the full BaseX distributions ({{Code|BaseX.zip}}, {{Code|BaseX.exe}}, etc.). It can also be manually downloaded and embedded on the appropriate platforms.
=====Maven=====
An easy way to add TagSoup to your own project is to follow this steps:
5. don't forget to run <code>mvn jetty:run</code> again
=====Debian=====
With Debian, TagSoup will be automatically detected and included after it has been installed via:
apt-get install libtagsoup-java
===TagSoup Options===
TagSoup offers a variety of options to customize the HTML conversion. For the complete list
* '''reuse''', '''help''': not supported.
====GUI====
Go to Menu ''Database'' → ''New'' and select "HTML" in the input format combo box.
can be entered.
====Command Line====
Turn on the HTML Parser before parsing documents, and set a file filter:
SET [[Options#CREATEFILTER|CREATEFILTER]] *.html
====XQuery====
The [[HTML Module]] provides a function for converting HTML to XML documents.
</pre>
==JSON Parser==
BaseX can also import JSON documents. The resulting format is described in the documentation for the XQuery [[JSON Module]]:
===GUI===
Go to Menu ''Database'' → ''New'' and select "JSON" in the input format combo box.
* '''JsonML''': Activate this option if the incoming file is a JsonML file.
===Command Line===
Turn on the JSON Parser before parsing documents, and set some optional, parser-specific options and a file filter:
SET [[Options#CREATEFILTER|CREATEFILTER]] *.json
===XQuery===
The [[JSON Module]] provides functions for converting JSON objects to XML documents.
==CSV Parser==
BaseX can be used to import CSV documents. Different alternatives how to proceed are shown in the following:
===GUI===
Go to Menu ''Database'' → ''New'' and select "CSV" in the input format combo box.
* '''Header''': Activate this option if the incoming CSV files have a header line.
===Command Line===
Turn on the CSV Parser before parsing documents, and set some optional, parser-specific options and a file filter. Unicode code points can be specified as separators; {{Code|32}} is the code point for spaces:
SET [[Options#CREATEFILTER|CREATEFILTER]] *.csv
===XQuery===
The [[CSV Module]] provides a function for converting CSV to XML documents.
</pre>
==Text Parser==
Plain text can be imported as well:
===GUI===
Go to Menu ''Database'' → ''New'' and select "TEXT" in the input format combobox.
* '''Lines''': Activate this option to create a <code>&lt;line&gt;...&lt;/line&gt;</code> element for each line of the input text file.
===Command Line===
Turn on the CSV Parser before parsing documents and set some optional, parser-specific options and a file filter:
SET [[Options#CREATEFILTER|CREATEFILTER]] *
===XQuery===
Similar to the other formats the text parser can be specified in the prolog of an
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu