Difference between revisions of "HTML Module"
Jump to navigation
Jump to search
Line 26: | Line 26: | ||
|- | |- | ||
| '''Summary''' | | '''Summary''' | ||
− | |Converts the HTML document specified by {{Code|$input}} to XML, and returns a document node. The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary). If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding.<br/>The {{Code|$options}} argument can be used to set [ | + | |Converts the HTML document specified by {{Code|$input}} to XML, and returns a document node. The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary). If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding.<br/>The {{Code|$options}} argument can be used to set [http://home.ccil.org/~cowan/XML/tagsoup/#program TagSoup options]. It can be specified<br /> |
* as children of an {{Code|<html:options/>}} element; e.g.: | * as children of an {{Code|<html:options/>}} element; e.g.: | ||
<pre class="brush:xml"> | <pre class="brush:xml"> |
Revision as of 00:50, 3 January 2013
This XQuery Module provides functions for converting HTML to XML. The input will only be converted if TagSoup is included in the classpath (see HTML Parsing for more details).
Conventions
All functions in this module are assigned to the http://basex.org/modules/html
namespace, which is statically bound to the html
prefix.
All errors are assigned to the http://basex.org/errors
namespace, which is statically bound to the bxerr
prefix.
Functions
html:parser
Signatures | html:parser() as xs:string |
Summary | Returns the name of the applied HTML parser (currently: TagSoup ). If an empty string is returned, TagSoup was not found in the classpath, and the input will be treated as well-formed XML. |
html:parse
Signatures | html:parse($input as xs:anyAtomicType) as document-node() html:parse($input as xs:anyAtomicType, $options as item()) as document-node() |
Summary | Converts the HTML document specified by $input to XML, and returns a document node. The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary). If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding.The $options argument can be used to set TagSoup options. It can be specified
<html:options> <html:key1 value='value1'/> ... </html:options>
map { "key1" := "value1", ... } |
Errors | BXHL0001 : the input cannot be converted to XML.
|
Examples |
<html> <body> <a shape="rect" href="ok.html"/> </body> </html>
|
Errors
Code | Description |
---|---|
BXHL0001
|
The input cannot be converted to XML. |
Changelog
The module was introduced with Version 7.5.1.