Changes

Jump to navigation Jump to search
80 bytes removed ,  19:44, 29 February 2012
==HTML Parsers==
With [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup], HTML can be imported in BaseX without any problems. TagSoup ensures that only well-formed HTML arrives at the XML parser (correct opening and closing tags, etc.). Hence, if TagSoup is not available on a system, there will be a lot of cases where importing HTML fails, no matter whether you use the GUI or the standalone mode.
===Installation===
If the TagSoup classes are accessible via the classpath , or if you run BaseX from the sources and use the Maven build manager, BaseX will automatically use TagSoup to prepare HTML input. Otherwise you may be faced with XML syntax issues during the import process. This applies regardless of whether you use the GUI or the standalone mode. TagSoup is also included in the complete BaseX distributions (BaseX.zip, BaseX.exe, etc.) or can be manually downloaded and embedded on the appropriate platforms. Using Debian, TagSoup will be automatically included after it has been installed via:
apt-get install libtagsoup-java
Go to Menu ''Database'' → ''New'' and select "HTML" in the input format combo box.
There's an info in the "Parsing" tab about whether tagsoup TagSoup is available or not.
The same applied to the "Resources" tab in the "Database Properties" dialog.
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu