Changes

Jump to navigation Jump to search
298 bytes added ,  18:50, 18 November 2020
This [[Module Library|XQuery Module]] provides functions for converting HTML to XML. Conversion will only take place if [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] is included in the classpath (see [[Parsers#HTML Parser|HTML Parsing]] for more details).
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/html}} </nowiki></code> namespace, which is statically bound to the {{Code|html}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Functions=
{| width='100%'
|-
| width='90120' | '''Signatures'''
|{{Code|'''html:parser'''() as xs:string}}<br />
|-
==html:parse==
 
{| width='100%'
|-
| width='90120' | '''Signatures'''|{{Func|html:parse|$input as xs:anyAtomicType|document-node()}}<br />{{Func|html:parse|$input as xs:anyAtomicType, $options as itemmap(*)?|document-node()}}<br />
|-
| '''Summary'''
|Converts the HTML document specified by {{Code|$input}} to XML, and returns a document node:<br/>
* The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary).
* If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding.
The {{Code|$options}} argument can be used to set [[Parsers#TagSoup Options|TagSoup Options]], which can be specified…<br />* as children of an {{Code|<html:options/>}} element; e.g.:<pre class="brush:xml"><html:options> <html:key1 value='value1'/> ...</html:options></pre>* as map, which contains all key/value pairs:<pre class="brush:xml">map { "key1" := "value1", ... }</pre>
|-
| '''Errors'''
|{{Error|BXHL0001parse|#Errors}} the input cannot be converted to XML.|} ==html:doc== {| width='100%'|-| width='120' | '''Signatures'''|{{Func|html:doc|$uri as xs:string?|document-node()?}}<br />{{Func|html:doc|$uri as xs:string?, $options as map(*)?|document-node()?}}<br />|-| '''Summary'''|Fetches the HTML document referred to by the given {{Code|$uri}}, converts it to XML and returns a document node. The {{Code|$options}} argument can be used to set [[Parsers#Options|TagSoup Options]].|-| '''Errors'''|{{Error|parse|#Errors}} the input cannot be converted to XML.
|}
;Query:
<pre classsyntaxhighlight lang="brush:xquery">
html:parse("<html>")
</presyntaxhighlight>
;Result:
<pre classsyntaxhighlight lang="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml"/>
</presyntaxhighlight>
===Specifying Options===
The next query creates an XML document without with namespaces:
;Query:
<pre classsyntaxhighlight lang="brush:xquery">html:parse("<a href='ok.html'/>", map { 'nons' := truefalse() })</presyntaxhighlight>
;Result:
<pre classsyntaxhighlight lang="brush:xml"><htmlxmlns="http://www.w3.org/1999/xhtml">
<body>
<a shape="rect" href="ok.html"/>
</body>
</html>
</presyntaxhighlight>
===Parsing Binary Input===
;Query:
<pre classsyntaxhighlight lang="brush:xquery">html:parse(fetch:content-binary("httphttps://en.wikipedia.org"))</presyntaxhighlight>
;Result:
<pre classsyntaxhighlight lang="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml" class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="UTF-8"/>
...
</presyntaxhighlight>
=Errors=
|Description
|-
|{{Code|BXHL0001parse}}
|The input cannot be converted to XML.
|}
=Changelog=
 
;Version 9.4
 
* Added: [[#html:doc|html:doc]]
 
;Version 9.0
 
* Updated: error codes updated; errors now use the module namespace
The module was introduced with Version 7.6.
 
[[Category:XQuery]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu