Changes

Jump to navigation Jump to search
148 bytes added ,  19:39, 29 February 2012
==HTML Parsers==
With [http://home.ccil.org/~cowan/XML/tagsoup/ tagsoupTagSoup] , HTML can be imported in BaseX without any problems. Tagsoup TagSoup ensures that only well-formed HTML arrives at the XML parser (correct opening and closing tags, etc.). Hence , if tagsoup TagSoup is not available on a system, there will be a lot of cases where importing HTML fails.
If tagsoup is available on the classpath or you run BaseX from the sources and use the Maven build manager, BaseX will automatically use tagsoup to prepare HTML input. Otherwise you may be faced with XML syntax issues during the import process. This applies regardless of whether you use the GUI or the standalone mode. TagSoup is also included in the complete BaseX distributions (BaseX.zip, BaseX.exe, etc.).===Installation===
===Tagsoup installation If the TagSoup classes are accessible via the classpath or if you run BaseX from the sources and use the Maven build manager, BaseX will automatically use TagSoup to prepare HTML input. Otherwise you may be faced with XML syntax issues during the import process. This applies regardless of whether you use the GUI or the standalone mode. TagSoup is also included in the complete BaseX distributions (BaseX.zip, BaseX.exe, etc.) or can be manually downloaded on the appropriate platforms. Using Debian===, TagSoup will be automatically included after it has been installed via:
# apt-get install libtagsoup-java
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu