Changes

Parsers (edit)

Revision as of 19:24, 27 March 2015

50 bytes removed , 19:24, 27 March 2015

no edit summary

Please visit the [[Serialization]] article if you want to know how to export data.

==XML Parsers==

BaseX provides two parsers to import XML data:

* Java’s [http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParser.html SAXParser] can also be selected for parsing XML documents. This parser is stricter than the built-in parser, but it refuses to process some large documents.

===GUI===

Go to Menu ''Database'' → ''New'', then choose the ''Parsing'' tab and (de)activate ''Use internal XML parser''. The parsing of DTDs can be turned on/off by selecting the checkbox below.

===Command Line===

To turn the internal XML parser and DTD parsing on/off, modify the <code>INTPARSE</code> and <code>DTD</code> options:

SET [[Options#DTD|DTD]] true

===XQuery===

The [[Database Module#db:add|db:add]] and [[Database Module#db:replace|db:replace]] functions can also be used to add new XML documents to the database. The following example query uses the internal XML parser and adds all files to the database <code>DB</code> that are found in the directory <code>2Bimported</code>:

</pre>

==HTML Parser==

With [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup], HTML can be imported in BaseX without any problems. TagSoup ensures that only well-formed HTML arrives at the XML parser (correct opening and closing tags, etc.). Hence, if TagSoup is not available on a system, there will be a lot of cases where importing HTML fails, no matter whether you use the GUI or the standalone mode.

===Installation===

=====Downloads=====

TagSoup is already included in the full BaseX distributions ({{Code|BaseX.zip}}, {{Code|BaseX.exe}}, etc.). It can also be manually downloaded and embedded on the appropriate platforms.

=====Maven=====

An easy way to add TagSoup to your own project is to follow this steps:

5. don't forget to run <code>mvn jetty:run</code> again

=====Debian=====

With Debian, TagSoup will be automatically detected and included after it has been installed via:

apt-get install libtagsoup-java

===TagSoup Options===

TagSoup offers a variety of options to customize the HTML conversion. For the complete list

* '''reuse''', '''help''': not supported.

====GUI====

Go to Menu ''Database'' → ''New'' and select "HTML" in the input format combo box.

can be entered.

====Command Line====

Turn on the HTML Parser before parsing documents, and set a file filter:

SET [[Options#CREATEFILTER|CREATEFILTER]] *.html

====XQuery====

The [[HTML Module]] provides a function for converting HTML to XML documents.

</pre>

==JSON Parser==

BaseX can also import JSON documents. The resulting format is described in the documentation for the XQuery [[JSON Module]]:

===GUI===

Go to Menu ''Database'' → ''New'' and select "JSON" in the input format combo box.

* '''JsonML''': Activate this option if the incoming file is a JsonML file.

===Command Line===

Turn on the JSON Parser before parsing documents, and set some optional, parser-specific options and a file filter:

SET [[Options#CREATEFILTER|CREATEFILTER]] *.json

===XQuery===

The [[JSON Module]] provides functions for converting JSON objects to XML documents.

==CSV Parser==

BaseX can be used to import CSV documents. Different alternatives how to proceed are shown in the following:

===GUI===

Go to Menu ''Database'' → ''New'' and select "CSV" in the input format combo box.

* '''Header''': Activate this option if the incoming CSV files have a header line.

===Command Line===

Turn on the CSV Parser before parsing documents, and set some optional, parser-specific options and a file filter. Unicode code points can be specified as separators; {{Code|32}} is the code point for spaces:

SET [[Options#CREATEFILTER|CREATEFILTER]] *.csv

===XQuery===

The [[CSV Module]] provides a function for converting CSV to XML documents.

</pre>

==Text Parser==

Plain text can be imported as well:

===GUI===

Go to Menu ''Database'' → ''New'' and select "TEXT" in the input format combobox.

* '''Lines''': Activate this option to create a <code><line>...</line></code> element for each line of the input text file.

===Command Line===

Turn on the CSV Parser before parsing documents and set some optional, parser-specific options and a file filter:

SET [[Options#CREATEFILTER|CREATEFILTER]] *

===XQuery===

Similar to the other formats the text parser can be specified in the prolog of an

CG

Bureaucrats, editor, reviewer, Administrators

13,550

edits

Changes

Parsers (edit)

Revision as of 19:24, 27 March 2015

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools