Changes

Jump to navigation Jump to search
356 bytes removed ,  00:19, 10 March 2012
apt-get install libtagsoup-java
 
===Options===
 
TagSoup offers a variety of options to customize the import of HTML. For the complete list
please visit the [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] website. BaseX supports
most of these options with a few exceptions:
 
* '''encoding''': BaseX tries to guess the input encoding but this can be overwritten by the user if necessary.
* '''files''': Not supported as input documents are piped directly to the XML parser.
* '''method''': Set to 'xml' as default. If this is set to 'html' ending tags may be missing for instance.
* '''version''': Dismissed, as TagSoup always falls back to 'version 1.0', no matter what the input is.
* '''standalone''': Deactivated.
* '''pyx''', '''pyxin''': Not supported as the XML parser can't handle this kind of input.
* '''output-encoding''': Not supported, BaseX already takes care of that.
* '''reuse''', '''help''': Not supported.
===GUI===
===XQuery===
 
<pre class="brush:xquery">
declare option db:parser "html";
doc("index.html")
</pre>
 
===TagSoup Options===
 
TagSoup offers a variety of options to customize the import of HTML. For the complete list
please visit the [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] website. BaseX supports
most of this options in general with a few exceptions:
 
* '''encoding''': BaseX tries to guess the input encoding but this can be overwritten by the user if necessary.
* '''files''': Not supported as input documents are piped directly to the XML parser.
* '''method''': Set to 'xml' as default. If this is set to 'html' ending tags may be missing for instance.
* '''version''': Dismissed, as TagSoup always falls back to 'version 1.0', no matter what the input is.
* '''standalone''': Deactivated.
* '''pyx''': Not supported as the XML parser can't handle this kind of input.
* '''pyxin''': See pyx option.
* '''reuse''': Not supported.
* '''output-encoding''': Not supported, BaseX already takes care of that.
* '''help''': Not supported.
* '''version''': Not supported.
 
These options can be changed like any other option in BaseX, for example via
XQuery:
<pre class="brush:xquery">
doc("index.html")
</pre>
 
And also via the [[Commands#SET|SET]] command, i.e.
 
SET [[Options#HTMLOPT|HTMLOPT]] method=xml
==JSON Parser==
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu