Difference between revisions of "Fetch Module"
Jump to navigation
Jump to search
(12 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
=Conventions= | =Conventions= | ||
− | All functions in this module are assigned to the <code><nowiki>http://basex.org/modules/fetch</nowiki></code> namespace, which is statically bound to the {{Code|fetch}} prefix.<br/> | + | All functions and errors in this module are assigned to the <code><nowiki>http://basex.org/modules/fetch</nowiki></code> namespace, which is statically bound to the {{Code|fetch}} prefix.<br/> |
− | |||
URI arguments can point be URLs or point to local files. Relative file paths will be resolved against the ''current working directory'' (for more details, have a look at the [[File Module#File Paths|File Module]]). | URI arguments can point be URLs or point to local files. Relative file paths will be resolved against the ''current working directory'' (for more details, have a look at the [[File Module#File Paths|File Module]]). | ||
Line 11: | Line 10: | ||
==fetch:binary== | ==fetch:binary== | ||
+ | |||
{| width='100%' | {| width='100%' | ||
|- | |- | ||
Line 17: | Line 17: | ||
|- | |- | ||
| '''Summary''' | | '''Summary''' | ||
− | |Fetches the resource referred to by the given URI and returns it as [[ | + | |Fetches the resource referred to by the given URI and returns it as [[Lazy Module|lazy]] {{Code|xs:base64Binary}} item. |
|- | |- | ||
| '''Errors''' | | '''Errors''' | ||
− | |{{Error| | + | |{{Error|open|#Errors}} the URI could not be resolved, or the resource could not be retrieved. |
|- | |- | ||
| '''Examples''' | | '''Examples''' | ||
| | | | ||
* <code><nowiki>fetch:binary("http://images.trulia.com/blogimg/c/5/f/4/679932_1298401950553_o.jpg")</nowiki></code> returns the addressed image. | * <code><nowiki>fetch:binary("http://images.trulia.com/blogimg/c/5/f/4/679932_1298401950553_o.jpg")</nowiki></code> returns the addressed image. | ||
− | * <code><nowiki> | + | * <code><nowiki>lazy:cache(fetch:binary("http://en.wikipedia.org"))</nowiki></code> enforces the fetch operation (otherwise, it will be delayed until requested first). |
|} | |} | ||
Line 36: | Line 36: | ||
|- | |- | ||
| '''Summary''' | | '''Summary''' | ||
− | |Fetches the resource referred to by the given {{Code|$uri}} and returns it as [[ | + | |Fetches the resource referred to by the given {{Code|$uri}} and returns it as [[Lazy Module|lazy]] {{Code|xs:string}} item: |
* The UTF-8 default encoding can be overwritten with the optional {{Code|$encoding}} argument. | * The UTF-8 default encoding can be overwritten with the optional {{Code|$encoding}} argument. | ||
* By default, invalid characters will be rejected. If {{Code|$fallback}} is set to true, these characters will be replaced with the Unicode replacement character <code>FFFD</code> (�). | * By default, invalid characters will be rejected. If {{Code|$fallback}} is set to true, these characters will be replaced with the Unicode replacement character <code>FFFD</code> (�). | ||
|- | |- | ||
| '''Errors''' | | '''Errors''' | ||
− | |{{Error| | + | |{{Error|open|#Errors}} the URI could not be resolved, or the resource could not be retrieved.<br/>{{Error|encoding|#Errors}} the specified encoding is not supported, or unknown. |
|- | |- | ||
| '''Examples''' | | '''Examples''' | ||
| | | | ||
* <code><nowiki>fetch:text("http://en.wikipedia.org")</nowiki></code> returns a string representation of the English Wikipedia main HTML page. | * <code><nowiki>fetch:text("http://en.wikipedia.org")</nowiki></code> returns a string representation of the English Wikipedia main HTML page. | ||
− | * <code><nowiki> | + | * <code><nowiki>fetch:text("http://www.bbc.com","US-ASCII",true())</nowiki></code> returns the BBC homepage in US-ASCII with all non-US-ASCII characters replaced with �. |
+ | * <code><nowiki>lazy:cache(fetch:text("http://en.wikipedia.org"))</nowiki></code> enforces the fetch operation (otherwise, it will be delayed until requested first). | ||
|} | |} | ||
Line 54: | Line 55: | ||
|- | |- | ||
| width='120' | '''Signatures''' | | width='120' | '''Signatures''' | ||
− | |{{Func|fetch:xml|$uri as xs:string|document-node()}}<br/>{{Func|fetch:xml|$uri as xs:string, $options as map(*)|document-node()}} | + | |{{Func|fetch:xml|$uri as xs:string|document-node()}}<br/>{{Func|fetch:xml|$uri as xs:string, $options as map(*)?|document-node()}} |
|- | |- | ||
| '''Summary''' | | '''Summary''' | ||
− | |Fetches the resource referred to by the given {{Code|$uri}} and returns it as | + | |Fetches the resource referred to by the given {{Code|$uri}} and returns it as XML document node.<br/>In contrast to <code>fn:doc</code>, each function call returns a different document node. As a consequence, document instances created by this function will not be kept in memory until the end of query evaluation.<br/>The {{Code|$options}} argument can be used to change the parsing behavior. Allowed options are all [[Options#Parsing|parsing]] and [[Options#XML Parsing|XML parsing]] options in lower case. |
|- | |- | ||
| '''Errors''' | | '''Errors''' | ||
− | |{{Error| | + | |{{Error|open|#Errors}} the URI could not be resolved, or the resource could not be retrieved. |
|- | |- | ||
| '''Examples''' | | '''Examples''' | ||
Line 71: | Line 72: | ||
<pre class="brush:xquery"> | <pre class="brush:xquery"> | ||
fetch:xml(file:base-dir() || "example.xml") | fetch:xml(file:base-dir() || "example.xml") | ||
+ | </pre> | ||
+ | * Return a web page as XML, preserve namespaces: | ||
+ | <pre class="brush:xquery"> | ||
+ | fetch:xml( | ||
+ | 'http://basex.org/', | ||
+ | map { | ||
+ | 'parser': 'html', | ||
+ | 'htmlparser': map { 'nons': false() } | ||
+ | } | ||
+ | ) | ||
+ | </pre> | ||
+ | |} | ||
+ | |||
+ | ==fetch:xml-binary== | ||
+ | |||
+ | {| width='100%' | ||
+ | |- | ||
+ | | width='120' | '''Signatures''' | ||
+ | |{{Func|fetch:xml-binary|$data as xs:base64Binary|document-node()}}<br/>{{Func|fetch:xml-binary|$data as xs:base64Binary, $options as map(*)?|document-node()}} | ||
+ | |- | ||
+ | | '''Summary''' | ||
+ | |Parses binary {{Code|$data}} and returns it as XML document node.<br/>In contrast to fn:parse-xml, which expects an XQuery string, the input of this function can be arbitrarily encoded. The encoding will be derived from the XML declaration or (in case of UTF16 or UTF32) from the first bytes of the input.<br/>The {{Code|$options}} argument can be used to change the parsing behavior. Allowed options are all [[Options#Parsing|parsing]] and [[Options#XML Parsing|XML parsing]] options in lower case. | ||
+ | |- | ||
+ | | '''Examples''' | ||
+ | | | ||
+ | * Retrieves file input as binary data and parses it as XML: | ||
+ | <pre class="brush:xquery"> | ||
+ | fetch:xml-binary(file:read-binary('doc.xml')) | ||
+ | </pre> | ||
+ | * Encodes a string as CP1252 and parses it as XML. The input and the string {{Code|touché}} will be correctly decoded because of the XML declaration: | ||
+ | <pre class="brush:xquery"> | ||
+ | fetch:xml-binary(convert:string-to-base64( | ||
+ | "<?xml version='1.0' encoding='CP1252'?><xml>touché</xml>", | ||
+ | "CP1252" | ||
+ | )) | ||
+ | </pre> | ||
+ | * Encodes a string as UTF16 and parses it as XML. The document will be correctly decoded, as the first bytes of the data indicate that the input must be UTF16: | ||
+ | <pre class="brush:xquery"> | ||
+ | fetch:xml-binary(convert:string-to-base64("<xml/>", "UTF16")) | ||
</pre> | </pre> | ||
|} | |} | ||
==fetch:content-type== | ==fetch:content-type== | ||
+ | |||
{| width='100%' | {| width='100%' | ||
|- | |- | ||
Line 86: | Line 127: | ||
|- | |- | ||
| '''Errors''' | | '''Errors''' | ||
− | |{{Error| | + | |{{Error|open|#Errors}} the URI could not be resolved, or the resource could not be retrieved. |
|- | |- | ||
| '''Examples''' | | '''Examples''' | ||
Line 99: | Line 140: | ||
|Description | |Description | ||
|- | |- | ||
− | |{{Code| | + | |{{Code|encoding}} |
+ | |The specified encoding is not supported, or unknown. | ||
+ | |- | ||
+ | |{{Code|open}} | ||
|The URI could not be resolved, or the resource could not be retrieved. | |The URI could not be resolved, or the resource could not be retrieved. | ||
− | |||
− | |||
− | |||
|} | |} | ||
=Changelog= | =Changelog= | ||
+ | |||
+ | ;Version 9.0 | ||
+ | |||
+ | * Added: [[#fetch:xml-binary|fetch:xml-binary]] | ||
+ | * Updated: error codes updated; errors now use the module namespace | ||
;Version 8.5 | ;Version 8.5 |
Revision as of 12:42, 29 July 2018
This XQuery Module provides simple functions to fetch the content of resources identified by URIs. Resources can be stored locally or remotely and e.g. use the file://
or http://
scheme. If more control over HTTP requests is required, the HTTP Module can be used. With the HTML Module, retrieved HTML documents can be converted to XML.
Contents
Conventions
All functions and errors in this module are assigned to the http://basex.org/modules/fetch
namespace, which is statically bound to the fetch
prefix.
URI arguments can point be URLs or point to local files. Relative file paths will be resolved against the current working directory (for more details, have a look at the File Module).
Functions
fetch:binary
Signatures | fetch:binary($uri as xs:string) as xs:base64Binary |
Summary | Fetches the resource referred to by the given URI and returns it as lazy xs:base64Binary item.
|
Errors | open : the URI could not be resolved, or the resource could not be retrieved.
|
Examples |
|
fetch:text
Signatures | fetch:text($uri as xs:string) as xs:string fetch:text($uri as xs:string, $encoding as xs:string) as xs:string fetch:text($uri as xs:string, $encoding as xs:string, $fallback as xs:boolean) as xs:string |
Summary | Fetches the resource referred to by the given $uri and returns it as lazy xs:string item:
|
Errors | open : the URI could not be resolved, or the resource could not be retrieved.encoding : the specified encoding is not supported, or unknown.
|
Examples |
|
fetch:xml
Signatures | fetch:xml($uri as xs:string) as document-node() fetch:xml($uri as xs:string, $options as map(*)?) as document-node()
|
Summary | Fetches the resource referred to by the given $uri and returns it as XML document node.In contrast to fn:doc , each function call returns a different document node. As a consequence, document instances created by this function will not be kept in memory until the end of query evaluation.The $options argument can be used to change the parsing behavior. Allowed options are all parsing and XML parsing options in lower case.
|
Errors | open : the URI could not be resolved, or the resource could not be retrieved.
|
Examples |
fetch:xml("http://en.wikipedia.org", map { 'chop': true() })
fetch:xml(file:base-dir() || "example.xml")
fetch:xml( 'http://basex.org/', map { 'parser': 'html', 'htmlparser': map { 'nons': false() } } ) |
fetch:xml-binary
Signatures | fetch:xml-binary($data as xs:base64Binary) as document-node() fetch:xml-binary($data as xs:base64Binary, $options as map(*)?) as document-node()
|
Summary | Parses binary $data and returns it as XML document node.In contrast to fn:parse-xml, which expects an XQuery string, the input of this function can be arbitrarily encoded. The encoding will be derived from the XML declaration or (in case of UTF16 or UTF32) from the first bytes of the input. The $options argument can be used to change the parsing behavior. Allowed options are all parsing and XML parsing options in lower case.
|
Examples |
fetch:xml-binary(file:read-binary('doc.xml'))
fetch:xml-binary(convert:string-to-base64( "<?xml version='1.0' encoding='CP1252'?><xml>touché</xml>", "CP1252" ))
fetch:xml-binary(convert:string-to-base64("<xml/>", "UTF16")) |
fetch:content-type
Signatures | fetch:content-type($uri as xs:string) as xs:string |
Summary | Returns the content-type (also called mime-type) of the resource specified by $uri :
|
Errors | open : the URI could not be resolved, or the resource could not be retrieved.
|
Examples |
|
Errors
Code | Description |
---|---|
encoding
|
The specified encoding is not supported, or unknown. |
open
|
The URI could not be resolved, or the resource could not be retrieved. |
Changelog
- Version 9.0
- Added: fetch:xml-binary
- Updated: error codes updated; errors now use the module namespace
- Version 8.5
- Updated: fetch:text:
$fallback
argument added.
- Version 8.0
- Added: fetch:xml
The module was introduced with Version 7.6.