Main Page » XQuery » Functions » HTML Functions

HTML Functions

This module provides functions for converting HTML to XML. Conversion will only take place if TagSoup or Validator.nu is included in the classpath (see HTML Parsing for more details).

Conventions

All functions and errors in this module are assigned to the http://basex.org/modules/html namespace, which is statically bound to the html prefix.

Functions

html:doc

Signature
html:doc(
  $source   as xs:string?,
  $options  as map(*)?     := {}
) as document-node()?
SummaryFetches the HTML document referred to by the given $source, converts it to XML and returns a document node. The $options argument can be used to set HTML Parser Options.
Errors
parseThe input cannot be converted to XML.

html:parse

Signature
html:parse(
  $value    as (xs:string|xs:base64Binary|xs:hexBinary),
  $options  as map(*)?                                   := {}
) as document-node()
SummaryConverts the HTML document specified by $value to XML and returns a document node:
  • If the input is passed on as binary, and if no encoding option is supplied, the HTML parser will try to choose the correct encoding automatically.
Errors
parseThe input cannot be converted to XML.

html:parser

Signature
html:parser() as xs:string
SummaryReturns the name of the default HTML parser (TagSoup, if available, or Validator.nu). If an empty string is returned, an HTML parser was not found in the classpath. If this the case, the input will be treated as well-formed XML, unless some HTML parser is selected using the method option.

Examples

Basic Example

The following query converts the specified string to an XML document node.

Query
html:parse("<html>")
Result
<html/>

Specifying Options

The next query creates an XML document with namespaces:

Query
html:parse("<a href='ok.html'/>", { 'nons': false() })
Result
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
    <a shape="rect" href="ok.html"/>
  </body>
</html>

Parsing Binary Input

If the input encoding is unknown, the data to be processed can be passed on in its binary representation. The HTML parser will automatically try to detect the correct encoding:

Query
html:parse(fetch:binary("https://en.wikipedia.org"))
Result
<html xmlns="http://www.w3.org/1999/xhtml"
  <head>
    <title>Wikipedia, the free encyclopedia</title>
    <meta charset="UTF-8"/>
    ...

Errors

CodeDescription
parseThe input cannot be converted to XML.

Changelog

Version 12.0
  • Added: support for using Validator.nu
Version 9.4Version 9.0
  • Updated: error codes updated; errors now use the module namespace
Version 7.6
  • Added: New module added.

⚡Generated with XQuery