Difference between revisions of "HTML Module"

From BaseX Documentation
Jump to navigation Jump to search
Line 1: Line 1:
This [[Module Library|XQuery Module]] provides functions for converting HTML to XML.
+
This [[Module Library|XQuery Module]] provides functions for converting HTML to XML. The input will only be converted if [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] is included in the classpath (see [[Parsers#HTML Parser|HTML Parsing]] for more details).
  
 
=Conventions=
 
=Conventions=
Line 7: Line 7:
  
 
=Functions=
 
=Functions=
 +
 +
==html:processor==
 +
 +
{| width='100%'
 +
|-
 +
| width='90' | '''Signatures'''
 +
|{{Code|'''html:processor'''() as xs:string}}<br />
 +
|-
 +
| '''Summary'''
 +
|Returns the name of the applied HTML processor (currently: "TagSoup" or "BaseX"). If the function returns BaseX, TagSoup was not found in the classpath, and the input will be assumed to be well-formed XML.<br />
 +
|}
  
 
==html:parse==
 
==html:parse==

Revision as of 22:55, 2 January 2013

This XQuery Module provides functions for converting HTML to XML. The input will only be converted if TagSoup is included in the classpath (see HTML Parsing for more details).

Conventions

All functions in this module are assigned to the http://basex.org/modules/html namespace, which is statically bound to the html prefix.
All errors are assigned to the http://basex.org/errors namespace, which is statically bound to the bxerr prefix.

Functions

html:processor

Signatures html:processor() as xs:string
Summary Returns the name of the applied HTML processor (currently: "TagSoup" or "BaseX"). If the function returns BaseX, TagSoup was not found in the classpath, and the input will be assumed to be well-formed XML.

html:parse

Signatures html:parse($input as xs:anyAtomicType) as document-node()
Summary Converts the HTML document specified by $input to XML, and returns a document node. The input may either be a string or a binary item (xs:hexBinary, xs:base64Binary). If the input is passed on in its binary representation, the HTML parser will try to automatically choose the correct encoding.
Errors BXHL0001: the input cannot be converted to XML.
Examples
  • html:parse("<html></html>") returns <html/>
  • html:parse(fetch:content-binary("http://en.wikipedia.org")) returns an XML representation of the English Wikipedia main page. The input is passed on its binary representation such that the HTML parser can automatically detect the correct encoding.

Errors

Code Description
BXHL0001 The input cannot be converted to XML.

Changelog

The module was introduced with Version 7.5.1.