Changes

Jump to navigation Jump to search
1,488 bytes added ,  13:58, 3 April 2020
no edit summary
This [[Module Library|XQuery Module]] provides functions for converting HTML to XML. The input Conversion will only be converted take place if [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] is included in the classpath (see [[Parsers#HTML Parser|HTML Parsing]] for more details).
=Conventions=
All functions and errors in this module are assigned to the {{Code|<code><nowiki>http://basex.org/modules/html}} </nowiki></code> namespace, which is statically bound to the {{Code|html}} prefix.<br/>All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Functions=
==html:processorparser==
{| width='100%'
|-
| width='90120' | '''Signatures'''|{{Code|'''html:processorparser'''() as xs:string}}<br />
|-
| '''Summary'''
|Returns the name of the applied HTML processor parser (currently: {{Code|TagSoup}} or ). If an ''empty string''). If the function returns an empty stringis returned, TagSoup was not found in the classpath, and the input will be parsed treated as well-formed XML.<br />
|}
==html:parse==
 
{| width='100%'
|-
| width='90120' | '''Signatures'''|{{Func|html:parse|$input as xs:anyAtomicType|document-node()}}<br />{{Func|html:parse|$input as xs:anyAtomicType, $options as map(*)?|document-node()}}<br />
|-
| '''Summary'''
|Converts the HTML document specified by {{Code|$input}} to XML, and returns a document node. :<br/>* The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary). * If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding. The {{Code|$options}} argument can be used to set [[Parsers#Options|TagSoup Options]].
|-
| '''Errors'''
|{{Error|BXHL0001parse|#Errors}} the input cannot be converted to XML.|} ==html:doc== {{Mark|Introduced with BaseX 9.4:}} {| width='100%'|-| width='120' | '''Signatures'''|{{Func|html:doc|$uri as xs:string?|document-node()?}}<br />{{Func|html:doc|$uri as xs:string?, $options as map(*)?|document-node()?}}<br />|-| '''Summary'''|Fetches the HTML document referred to by the given {{Code|$uri}}, converts it to XML and returns a document node. The {{Code|$options}} argument can be used to set [[Parsers#Options|TagSoup Options]].
|-
| '''ExamplesErrors'''|* {{CodeError|html:parse("<html></html>")}} returns {{Code|<html/>#Errors}}* <code><nowiki>html:parse(fetch:content-binary("http://en.wikipedia.org"))</nowiki></code> returns an XML representation of the English Wikipedia main page. The input is passed on its binary representation such that the HTML parser can automatically detect the correct encodingcannot be converted to XML.
|}
 
=Examples=
 
===Basic Example===
 
The following query converts the specified string to an XML document node.
 
;Query:
<syntaxhighlight lang="xquery">
html:parse("<html>")
</syntaxhighlight>
 
;Result:
<syntaxhighlight lang="xml">
<html xmlns="http://www.w3.org/1999/xhtml"/>
</syntaxhighlight>
 
===Specifying Options===
 
The next query creates an XML document with namespaces:
 
;Query:
<syntaxhighlight lang="xquery">
html:parse("<a href='ok.html'/>", map { 'nons': false() })
</syntaxhighlight>
 
;Result:
<syntaxhighlight lang="xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<a shape="rect" href="ok.html"/>
</body>
</html>
</syntaxhighlight>
 
===Parsing Binary Input===
 
If the input encoding is unknown, the data to be processed can be passed on in its binary representation.
The HTML parser will automatically try to detect the correct encoding:
 
;Query:
<syntaxhighlight lang="xquery">
html:parse(fetch:binary("https://en.wikipedia.org"))
</syntaxhighlight>
 
;Result:
<syntaxhighlight lang="xml">
<html xmlns="http://www.w3.org/1999/xhtml" class="client-nojs" dir="ltr" lang="en">
<head>
<title>Wikipedia, the free encyclopedia</title>
<meta charset="UTF-8"/>
...
</syntaxhighlight>
=Errors=
{| width='100%' class="wikitable" width="100%"! width="5%110"|Code! width="95%"|Description
|-
|{{Code|BXHL0001parse}}
|The input cannot be converted to XML.
|}
=Changelog=
The module was introduced with ;Version 79.5.14 * Added: [[#html:doc|html:doc]] ;Version 9.0 * Updated: error codes updated; errors now use the module namespace
[[Category:XQuery]]The module was introduced with Version 7.6.
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu