Difference between revisions of "HTML Module"
m (Text replacement - "<syntaxhighlight lang="xquery">" to "<pre lang='xquery'>") Tags: Mobile web edit Mobile edit |
m (Text replacement - "</syntaxhighlight>" to "</pre>") |
||
Line 65: | Line 65: | ||
<pre lang='xquery'> | <pre lang='xquery'> | ||
html:parse("<html>") | html:parse("<html>") | ||
− | </ | + | </pre> |
;Result: | ;Result: | ||
<syntaxhighlight lang="xml"> | <syntaxhighlight lang="xml"> | ||
<html xmlns="http://www.w3.org/1999/xhtml"/> | <html xmlns="http://www.w3.org/1999/xhtml"/> | ||
− | </ | + | </pre> |
===Specifying Options=== | ===Specifying Options=== | ||
Line 79: | Line 79: | ||
<pre lang='xquery'> | <pre lang='xquery'> | ||
html:parse("<a href='ok.html'/>", map { 'nons': false() }) | html:parse("<a href='ok.html'/>", map { 'nons': false() }) | ||
− | </ | + | </pre> |
;Result: | ;Result: | ||
Line 88: | Line 88: | ||
</body> | </body> | ||
</html> | </html> | ||
− | </ | + | </pre> |
===Parsing Binary Input=== | ===Parsing Binary Input=== | ||
Line 98: | Line 98: | ||
<pre lang='xquery'> | <pre lang='xquery'> | ||
html:parse(fetch:binary("https://en.wikipedia.org")) | html:parse(fetch:binary("https://en.wikipedia.org")) | ||
− | </ | + | </pre> |
;Result: | ;Result: | ||
Line 107: | Line 107: | ||
<meta charset="UTF-8"/> | <meta charset="UTF-8"/> | ||
... | ... | ||
− | </ | + | </pre> |
=Errors= | =Errors= |
Revision as of 18:35, 1 December 2023
This XQuery Module provides functions for converting HTML to XML. Conversion will only take place if TagSoup is included in the classpath (see HTML Parsing for more details).
Contents
Conventions
All functions and errors in this module are assigned to the http://basex.org/modules/html
namespace, which is statically bound to the html
prefix.
Functions
html:doc
Signature | html:doc( $href as xs:string?, $options as map(*)? := map { } ) as document-node()? |
Summary | Fetches the HTML document referred to by the given $href , converts it to XML and returns a document node. The $options argument can be used to set TagSoup Options.
|
Errors | parse : the input cannot be converted to XML.
|
html:parse
Signature | html:parse( $value as xs:anyAtomicType, $options as map(*)? := map { } ) as document-node() |
Summary | Converts the HTML document specified by $value to XML and returns a document node:
The |
Errors | parse : the input cannot be converted to XML.
|
html:parser
Signature | html:parser() as xs:string |
Summary | Returns the name of the applied HTML parser (currently: TagSoup ). If an empty string is returned, TagSoup was not found in the classpath, and the input will be treated as well-formed XML. |
Examples
Basic Example
The following query converts the specified string to an XML document node.
- Query
html:parse("<html>")
- Result
<syntaxhighlight lang="xml"> <html xmlns="http://www.w3.org/1999/xhtml"/>
Specifying Options
The next query creates an XML document with namespaces:
- Query
html:parse("<a href='ok.html'/>", map { 'nons': false() })
- Result
<syntaxhighlight lang="xml"> <html xmlns="http://www.w3.org/1999/xhtml">
<body> <a shape="rect" href="ok.html"/> </body>
</html>
Parsing Binary Input
If the input encoding is unknown, the data to be processed can be passed on in its binary representation. The HTML parser will automatically try to detect the correct encoding:
- Query
html:parse(fetch:binary("https://en.wikipedia.org"))
- Result
<syntaxhighlight lang="xml"> <html xmlns="http://www.w3.org/1999/xhtml" class="client-nojs" dir="ltr" lang="en">
<head> <title>Wikipedia, the free encyclopedia</title> <meta charset="UTF-8"/> ...
Errors
Code | Description |
---|---|
parse
|
The input cannot be converted to XML. |
Changelog
- Version 9.4
- Added:
html:doc
- Version 9.0
- Updated: error codes updated; errors now use the module namespace
The module was introduced with Version 7.6.