HTML Functions
This module provides functions for converting HTML to XML. Conversion will only take place if TagSoup or Validator.nu is included in the classpath (see HTML Parsing for more details).
Conventions
All functions and errors in this module are assigned to the http://basex.org/modules/html
namespace, which is statically bound to the html
prefix.
Functions
html:doc
Signature | html:doc( $source as xs:string?, $options as map(*)? := {} ) as document-node()? | ||
---|---|---|---|
Summary | Fetches the HTML document referred to by the given $source , converts it to XML and returns a document node. The $options argument can be used to set HTML Parser Options. | ||
Errors |
|
html:parse
Signature | html:parse( $value as (xs:string|xs:base64Binary|xs:hexBinary), $options as map(*)? := {} ) as document-node() | ||
---|---|---|---|
Summary | Converts the HTML document specified by $value to XML and returns a document node:
| ||
Errors |
|
html:parser
Signature | html:parser() as xs:string |
---|---|
Summary | Returns the name of the default HTML parser (TagSoup , if available, or Validator.nu ). If an empty string is returned, an HTML parser was not found in the classpath. If this the case, the input will be treated as well-formed XML, unless some HTML parser is selected using the method option. |
Examples
Basic Example
The following query converts the specified string to an XML document node.
Queryhtml:parse("<html>")
Result
<html/>
Specifying Options
The next query creates an XML document with namespaces:
Queryhtml:parse("<a href='ok.html'/>", { 'nons': false() })
Result
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<a shape="rect" href="ok.html"/>
</body>
</html>
Parsing Binary Input
If the input encoding is unknown, the data to be processed can be passed on in its binary representation. The HTML parser will automatically try to detect the correct encoding:
Queryhtml:parse(fetch:binary("https://en.wikipedia.org"))
Result
<html xmlns="http://www.w3.org/1999/xhtml"
<head>
<title>Wikipedia, the free encyclopedia</title>
<meta charset="UTF-8"/>
...
Errors
Code | Description |
---|---|
parse | The input cannot be converted to XML. |
Changelog
Version 12.0- Added: support for using Validator.nu
- Added:
html:doc
- Updated: error codes updated; errors now use the module namespace
- Added: New module added.
⚡Generated with XQuery