Changes

Jump to navigation Jump to search
814 bytes added ,  13:31, 28 June 2019
m
Made link to Wikipedia HTTPS for binary example - as HTTP returns nothing
This [[Module Library|XQuery Module]] provides functions for converting HTML to XML. The input Conversion will only be converted take place if [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] is included in the classpath (see [[Parsers#HTML Parser|HTML Parsing]] for more details).
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/html}} </nowiki></code> namespace, which is statically bound to the {{Code|html}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Functions=
==html:processorparser==
{| width='100%'
|-
| width='90120' | '''Signatures'''|{{Code|'''html:processorparser'''() as xs:string}}<br />
|-
| '''Summary'''
|Returns the name of the applied HTML processor parser (currently: "{{Code|TagSoup" or "BaseX"}}). If the function returns BaseXan ''empty string'' is returned, TagSoup was not found in the classpath, and the input will be assumed to be treated as well-formed XML.<br />
|}
{| width='100%'
|-
| width='90120' | '''Signatures'''|{{Func|html:parse|$input as xs:anyAtomicType|document-node()}}<br />{{Func|html:parse|$input as xs:anyAtomicType, $options as map(*)?|document-node()}}<br />
|-
| '''Summary'''
|Converts the HTML document specified by {{Code|$input}} to XML, and returns a document node. :<br/>* The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary). * If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding. The {{Code|$options}} argument can be used to set [[Parsers#Options|TagSoup Options]].
|-
| '''Errors'''
|{{Error|BXHL0001parse|#Errors}} the input cannot be converted to XML.|-| '''Examples'''|* {{Code|html:parse("<html></html>")}} returns {{Code|<html/>}}* <code><nowiki>html:parse(fetch:content-binary("http://en.wikipedia.org"))</nowiki></code> returns an XML representation of the English Wikipedia main page. The input is passed on its binary representation such that the HTML parser can automatically detect the correct encoding.
|}
 
=Examples=
 
===Basic Example===
 
The following query converts the specified string to an XML document node.
 
;Query:
<pre class="brush:xquery">
html:parse("<html>")
</pre>
 
;Result:
<pre class="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml"/>
</pre>
 
===Specifying Options===
 
The next query creates an XML document with namespaces:
 
;Query:
<pre class="brush:xquery">
html:parse("<a href='ok.html'/>", map { 'nons': false() })
</pre>
 
;Result:
<pre class="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<a shape="rect" href="ok.html"/>
</body>
</html>
</pre>
 
===Parsing Binary Input===
 
If the input encoding is unknown, the data to be processed can be passed on in its binary representation.
The HTML parser will automatically try to detect the correct encoding:
 
;Query:
<pre class="brush:xquery">
html:parse(fetch:binary("https://en.wikipedia.org"))
</pre>
 
;Result:
<pre class="brush:xml">
<html xmlns="http://www.w3.org/1999/xhtml" class="client-nojs" dir="ltr" lang="en">
<head>
<title>Wikipedia, the free encyclopedia</title>
<meta charset="UTF-8"/>
...
</pre>
=Errors=
{| width='100%' class="wikitable" width="100%"! width="5%110"|Code! width="95%"|Description
|-
|{{Code|BXHL0001parse}}
|The input cannot be converted to XML.
|}
=Changelog=
The module was introduced with ;Version 7.5.19.0 * Updated: error codes updated; errors now use the module namespace
[[Category:XQuery]]The module was introduced with Version 7.6.
administrator, editor
33

edits

Navigation menu