Changes

Jump to navigation Jump to search
384 bytes added ,  13:58, 3 April 2020
no edit summary
This [[Module Library|XQuery Module]] provides functions for converting HTML to XML. Conversion will only take place if [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] is included in the classpath (see [[Parsers#HTML Parser|HTML Parsing]] for more details).
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/html}} </nowiki></code> namespace, which is statically bound to the {{Code|html}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Functions=
{| width='100%'
|-
| width='90120' | '''Signatures'''
|{{Code|'''html:parser'''() as xs:string}}<br />
|-
==html:parse==
 
{| width='100%'
|-
| width='90120' | '''Signatures'''|{{Func|html:parse|$input as xs:anyAtomicType|document-node()}}<br />{{Func|html:parse|$input as xs:anyAtomicType, $options as itemmap(*)?|document-node()}}<br />
|-
| '''Summary'''
|Converts the HTML document specified by {{Code|$input}} to XML, and returns a document node:<br/>
* The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary).
* If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding.
The {{Code|$options}} argument can be used to set [[Parsers#TagSoup Options|TagSoup Options]], which can be specified…<br />.|-| '''Errors'''* as children of an |{{CodeError|parse|<html:options/>#Errors}} element; ethe input cannot be converted to XML.g.:<pre class|} =="brushhtml:xml">doc== <html{{Mark|Introduced with BaseX 9.4:options>}}  <html:key1 value{| width='value1100%'/> ...|-| width='120' | '''Signatures'''|{{Func|html:doc|$uri as xs:string?|document-node()?}}<br />{{Func|html:doc|$uri as xs:string?, $options>as map(*)?|document-node()?}}<br /pre>* as map, which contains all key/value pairs:|-<pre class="brush:xml">| '''Summary'''map |Fetches the HTML document referred to by the given {{ "key1" := "value1"Code|$uri}}, converts it to XML and returns a document node.The {{Code|$options}} argument can be used to set [[Parsers#Options|TagSoup Options]].. }</pre>
|-
| '''Errors'''
|{{Error|BXHL0001parse|#Errors}} the input cannot be converted to XML.
|}
==Examples==
===Simple Basic Example===
The following query converts the specified string to an XML document node.
;Query:
<pre classsyntaxhighlight lang="brush:xquery">html:parse("<html></html>")</presyntaxhighlight>
;Result:
<pre classsyntaxhighlight lang="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml"/>
</presyntaxhighlight>
===Specifying Options===
The next query creates an XML document without with namespaces:
;Query:
<pre classsyntaxhighlight lang="brush:xquery">html:parse("<a href='ok.html'/>", map { 'nons' := truefalse() })</presyntaxhighlight>
;Result:
<pre classsyntaxhighlight lang="brush:xml"><htmlxmlns="http://www.w3.org/1999/xhtml">
<body>
<a shape="rect" href="ok.html"/>
</body>
</html>
</presyntaxhighlight>
===Parsing binary inputBinary Input===
Binary If the input encoding is unknown, the data to be processed can be specified passed on in order to let the its binary representation.The HTML parser will automatically try to detect the correct encoding:
;Query:
<pre classsyntaxhighlight lang="brush:xquery">html:parse(fetch:content-binary("httphttps://en.wikipedia.org"))</presyntaxhighlight>
;Result:
<pre classsyntaxhighlight lang="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml" class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="UTF-8"/>
...
</presyntaxhighlight>
=Errors=
{| width='100%' class="wikitable" width="100%"! width="5%110"|Code! width="95%"|Description
|-
|{{Code|BXHL0001parse}}
|The input cannot be converted to XML.
|}
=Changelog=
The module was introduced with ;Version 79.5.14 * Added: [[#html:doc|html:doc]] ;Version 9.0 * Updated: error codes updated; errors now use the module namespace
[[Category:XQuery]]The module was introduced with Version 7.6.
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu