Main Page » XQuery » Functions » Fetch Functions

Fetch Functions

This module provides simple functions to retrieve the contents of resources identified by URIs. Resources can be stored locally or remotely, e.g., using the file:// or http:// scheme. If more control over HTTP requests is required, the HTTP Client Functions can be used. With the HTML Functions, retrieved HTML documents can be converted to XML.

Conventions

All functions and errors in this module are assigned to the http://basex.org/modules/fetch namespace, which is statically bound to the fetch prefix.

URI arguments can point be URLs or point to local files. Relative file paths will be resolved against the current working directory (for more details, take a look at the File Functions).

Functions

`fetch:binary`

Signature

fetch:binary(
  $source  as xs:string
) as xs:base64Binary

Summary Fetches the resource referred to by the given source string and returns it as lazy xs:base64Binary item.

Errors

open The URI could not be resolved, or the resource could not be retrieved.

Examples

fetch:binary("http://images.trulia.com/blogimg/c/5/f/4/679932_1298401950553_o.jpg")

Returns the addressed image.

lazy:cache(fetch:binary("http://en.wikipedia.org"))

Enforces the fetch operation (otherwise, it will be delayed until requested first).

`fetch:text`

Signature

fetch:text(
  $source    as xs:string,
  $encoding  as xs:string?  := (),
  $fallback  as xs:boolean?  := false()
) as xs:string

Summary

Fetches the resource referred to by the given source string and returns it as lazy xs:string item:

The UTF-8 default encoding can be overwritten with the optional $encoding argument.
By default, invalid characters will be rejected. If $fallback is set to true, these characters will be replaced with the Unicode replacement character FFFD (�).

Errors

`encoding`	The specified encoding is not supported, or unknown.
`open`	The URI could not be resolved, or the resource could not be retrieved.

Examples

fetch:text("http://en.wikipedia.org")

Returns a string representation of the English Wikipedia main HTML page.

fetch:text("http://www.bbc.com","US-ASCII",true())

Returns the BBC homepage in US-ASCII with all non-US-ASCII characters replaced with �.

lazy:cache(fetch:text("http://en.wikipedia.org"))

Enforces the fetch operation (otherwise, it will be delayed until requested first).

`fetch:doc`

Signature

fetch:doc(
  $source   as xs:string,
  $options  as map(*)?  := {}
) as document-node()

Summary

Fetches the resource referred to by the given source string and returns it as a document node. The $options argument can be used to change the parsing behavior. Allowed options are all parsing and XML parsing options in lower case. The function differs from fn:doc in various aspects:

It is nondeterministic, i.e., a new document node will be created by each call of this function.
A document created by this function will be garbage-collected as soon as it is not referenced anymore.
URIs will not be resolved against existing databases. As a result, it will not trigger any locks (see limitations of database locking for more details).

Errors

open The URI could not be resolved, or the resource could not be retrieved.

Examples

fetch:doc("http://en.wikipedia.org", { 'stripws': true() })

Retrieve an XML representation of the English Wikipedia main HTML page with whitespace stripped.

fetch:doc(
  'http://basex.org/',
  { 'parser': 'html', 'htmlparser': { 'nons': false() } }
)

Return a web page as XML, preserve namespaces.

`fetch:binary-doc`

Signature

fetch:binary-doc(
  $input    as (xs:base64Binary|xs:hexBinary),
  $options  as map(*)?  := {}
) as document-node()

Summary Converts the specified $input to XML and returns it as a document node. In contrast to fn:parse-xml, which expects a string, the input can be arbitrarily encoded. The encoding will be derived from the XML declaration or (in case of UTF-16 or UTF-32) from the first bytes of the input. The $options argument can be used to change the parsing behavior. Allowed options are all parsing and XML parsing options in lower case.

Errors

open The URI could not be resolved, or the resource could not be retrieved.

Examples

fetch:binary-doc(file:read-binary('doc.xml'))

Retrieves file input as binary data and parses it as XML.

fetch:binary-doc(convert:string-to-base64(
  "<?xml version='1.0' encoding='CP1252'?><xml>touché</xml>",
  "CP1252"
))

Encodes a string as CP1252 and parses it as XML. The input and the string touché will be correctly decoded because of the XML declaration.

fetch:binary-doc(convert:string-to-base64("<xml/>", "UTF16"))

Encodes a string as UTF-16 and parses it as XML. The document will be correctly decoded, as the first bytes of the data indicate that the input must be UTF-16.

`fetch:content-type`

Signature

fetch:content-type(
  $source  as xs:string
) as xs:string

Summary

Returns the content-type (also called mime-type) of the resource specified by source string:

If a remote resource is addressed, the request header will be evaluated.
If the addressed resource is locally stored, the content-type will be guessed based on the file extension.

Errors

open The URI could not be resolved, or the resource could not be retrieved.

Examples

fetch:content-type('https://docs.basex.org/')

Result: 'text/html; charset=UTF-8'

Errors

Code	Description
`encoding`	The specified encoding is not supported, or unknown.
`open`	The URI could not be resolved, or the resource could not be retrieved.

Changelog

Version 10.0

Updated: fetch:doc renamed (before: fetch:xml).
Updated: fetch:binary-doc renamed (before: fetch:xml-binary).

Version 9.0

Added: fetch:xml-binary
Updated: error codes updated; errors now use the module namespace

Version 8.5

Updated: fetch:text: $fallback argument added.

Version 8.0

Added: fetch:xml

Version 7.6

Added: New module added.

⚡Generated with XQuery