Difference between revisions of "HTTP Client Module"

From BaseX Documentation
Jump to navigation Jump to search
m (Text replace - "www.basex.org" to "basex.org")
(47 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This module contains an [[Querying#Functions|XQuery function]] to send HTTP requests and handle HTTP responses. The function <code>send-request</code>, which is introduced with the <code>http:</code> prefix, is linked to the statically declared <code><nowiki>http://expath.org/ns/http-client</nowiki></code> namespace and based on the [http://expath.org/spec/http-client EXPath HTTP Client Module]:
+
This [[Module Library|XQuery Module]] contains a single function to send HTTP requests and handle HTTP responses. The function {{Code|send-request}} is based on the [http://expath.org/spec/http-client EXPath HTTP Client Module]. It gives full control over the available request and response parameters. For simple GET requests, the [[Fetch Module]] may be sufficient.
 +
 
 +
With {{Version|9.0}}, if <code><http:header name="Accept-Encoding" value="gzip"/></code> is specified and if the addressed web server provides support for the {{Code|gzip}} compression algorithm, the response will automatically be decompressed.
 +
 
 +
=Conventions=
 +
 
 +
All functions in this module are assigned to the <code><nowiki>http://expath.org/ns/http-client</nowiki></code> namespace, which is statically bound to the {{Code|http}} prefix.<br/>
 +
All errors are assigned to the <code><nowiki>http://expath.org/ns/error</nowiki></code> namespace, which is statically bound to the {{Code|exerr}} prefix.
 +
 
 +
=Functions=
  
 
==http:send-request==
 
==http:send-request==
{|
+
 
 +
{| width='100%'
 
|-
 
|-
| valign='top' width='90' | '''Signatures'''
+
| width='120' | '''Signatures'''
|<code><b>http:send-request</b>($request as element(http:request)?, $href as xs:string?, $bodies as item()*) as item()+</code><br/><code><b>http:send-request</b>($request as element(http:request)) as item()+</code><br /><code><b>http:send-request</b>($request as element(http:request)?, $href as xs:string?) as item()+</code><br />
+
|{{Func|http:send-request|$request as element(http:request)?, $href as xs:string?, $bodies as item()*|item()+}}<br/>{{Func|http:send-request|$request as element(http:request)|item()+}}<br />{{Func|http:send-request|$request as element(http:request)?, $href as xs:string?|item()+}}<br />
 
|-
 
|-
| valign='top' | '''Summary'''
+
| '''Summary'''
|Sends an HTTP request and interprets the corresponding response. <code>$request</code> contains the parameters of the HTTP request such as HTTP method and headers. In addition to this it can also contain the URI to which the request will be sent and the body of the HTTP method. If the URI is not given with the parameter <code>$href</code>, its value in <code>$request</code> is used instead. The structure of <code>http:request</code> element follows the [http://expath.org/spec/http-client EXPath] specification.
+
|Sends an HTTP request and interprets the corresponding response. {{Code|$request}} contains the parameters of the HTTP request such as HTTP method and headers. In addition to this it can also contain the URI to which the request will be sent and the body of the HTTP method. If the URI is not given with the parameter {{Code|$href}}, its value in {{Code|$request}} is used instead.<br/>The structure of {{Code|http:request}} element follows the [http://expath.org/spec/http-client EXPath] specification. Both basic and digest authentication is supported.
 
|-
 
|-
| valign='top' | '''Notes'''
+
|'''Errors'''
|The attribute <code>auth-method</code> of <code>$request</code> is not considered in our implementation because we are handling only basic authentication.
+
|{{Error|HC0001|#Errors}} an HTTP error occurred.<br/>{{Error|HC0002|#Errors}} error parsing the entity content as XML or HTML.<br/>{{Error|HC0003|#Errors}} with a multipart response, the override-media-type must be either a multipart media type or application/octet-stream.<br/>{{Error|HC0004|#Errors}} the src attribute on the body element is mutually exclusive with all other attribute (except the media-type).<br/>{{Error|HC0005|#Errors}} the request element is not valid.<br/>{{Error|HC0006|#Errors}} a timeout occurred waiting for the response.
 
|}
 
|}
  
==Examples==
+
=Examples=
  
'''Example 1: Simple GET request. Attribute <code>status-only</code> is set to true and that is why only the response element is returned, no content.'''
+
==Status Only==
 +
Simple GET request. As the attribute {{Code|status-only}} is set to true, only the response element is returned.
  
 
'''Query:'''
 
'''Query:'''
Line 34: Line 45:
 
</http:response></pre>
 
</http:response></pre>
  
'''Example 2: Retrieve Google search home page. [http://home.ccil.org/~cowan/XML/tagsoup/ TagSoup] must be referenced in the class path in order to parse html.'''
+
==Google Homepage==
 +
 
 +
Retrieve the Google search home page with a timeout of 10 seconds. In order to [[Parsers#HTML_Parser|parse HTML]], TagSoup must be contained in the class path.
  
 
'''Query:'''
 
'''Query:'''
<pre class="brush:xquery">http:send-request(<http:request method='get' href='http://www.google.com'/>)</pre>
+
<pre class="brush:xquery">http:send-request(<http:request method='get' href='http://www.google.com' timeout='10'/>)</pre>
 
'''Result:'''
 
'''Result:'''
 
<pre class="brush:xml">
 
<pre class="brush:xml">
Line 45: Line 58:
 
   <http:header name="Expires" value="-1"/>
 
   <http:header name="Expires" value="-1"/>
 
   <http:header name="X-XSS-Protection" value="1; mode=block"/>
 
   <http:header name="X-XSS-Protection" value="1; mode=block"/>
   <http:header name="Set-Cookie" value="NID=44=Hb575zZBVz3JsZmk7JTwpX7WQ7VODk-KQmbtyDnLawiHB7sIEScdRBD9apIqR8VjH1MexPV4OABBdr1CBm0Ku-1bUncC-v1XAVYql85IoyQfx1zJiFyWCZdIC9B22jV1; expires=Tue, 13-Sep-2011 22:03:25 GMT; path=/; domain=.google.ch; HttpOnly"/>
+
   <http:header name="Set-Cookie" value="...; expires=Tue, 13-Sep-2011 22:03:25 GMT; path=/; domain=.google.ch; HttpOnly"/>
 
   <http:header name="Content-Type" value="text/html; charset=ISO-8859-1"/>
 
   <http:header name="Content-Type" value="text/html; charset=ISO-8859-1"/>
 
   <http:header name="Server" value="gws"/>
 
   <http:header name="Server" value="gws"/>
Line 55: Line 68:
 
     <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>
 
     <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>
 
     <title>Google</title>
 
     <title>Google</title>
     <script>window.google={kEI:"rZB-TcL0A4Ov8QOx6YHLCA",kEXPI:"28501,29014,29135,29265,29279",kCSI:{e:"28501,29014,29135,29265,29279",ei:"rZB-TcL0A4Ov8QOx6YHLCA",expi:"28501,29014,29135,29265,29279"},ml:function(){},kHL:"de",time:function(){return(new Date).getTime()},log:function(c,d,
+
     <script>window.google={kEI:"rZB-
b){var a=new Image,e=google,g=e.lc,f=e.li;a.onerror=(a.onload=(a.onabort=function(){delete g[f]}));g[f]=a;b=b||"/gen_204?atyp=i&amp;ct="+c+"&amp;cad="+d+"&amp;zx="+google.time();a.src=b;e.li=f+1},lc:[],li:0,Toolbelt:{}};
+
    ...  
....
+
    </script>
....
 
....
 
google.timers.load.t)return;google.timers.load.t.ol=(new Date).getTime();google.timers.load.t.iml=f;google.kCSI.imc=d;google.kCSI.imn=b;google.kCSI.imp=e;google.timers.load.t.xjs&amp;&amp;google.report&amp;&amp;google.report(google.timers.load,google.kCSI)}if(window.addEventListener)window.addEventListener("load",l,false);else if(window.attachEvent)window.attachEvent("onload",l);google.timers.load.t.prt=(f=(new Date).getTime());
 
})();</script>
 
 
     </center>
 
     </center>
 
   </body>
 
   </body>
 
</html>
 
</html>
 
</pre>
 
</pre>
'''Example 3: Example with content type ending with +xml, e.g. image/svg+xml.'''
+
 
 +
The response content type can also be overwritten in order to retrieve HTML pages and other textual data as plain string (using {{Code|text/plain}}) or in its binary representation (using {{Code|application/octet-stream}}). With the {{Code|http:header}} element, a custom user agent can be set. See the following example:
 +
 
 +
'''Query:'''
 +
<pre class="brush:xquery">
 +
let $binary :=  http:send-request(
 +
  <http:request method='get'
 +
    override-media-type='application/octet-stream'
 +
    href='http://www.google.com'>
 +
    <http:header name="User-Agent" value="Opera"/>
 +
  </http:request>
 +
)[2]
 +
return try {
 +
  html:parse($binary)
 +
} catch * {
 +
  'Conversion to XML failed: ' || $err:description
 +
}
 +
</pre>
 +
 
 +
===SVG Data===
 +
 
 +
Content-type ending with +xml, e.g. image/svg+xml.
  
 
'''Query:'''
 
'''Query:'''
Line 72: Line 102:
  
 
'''Result:'''
 
'''Result:'''
<pre class="brush:xquery"><http:response status="200" message="OK">
+
<pre class="brush:xml"><http:response status="200" message="OK">
 
   <http:header name="ETag" value="W/&quot;11b6d-4ba15ed4&quot;"/>
 
   <http:header name="ETag" value="W/&quot;11b6d-4ba15ed4&quot;"/>
 
   <http:header name="Age" value="9260"/>
 
   <http:header name="Age" value="9260"/>
Line 91: Line 121:
 
       <stop stop-color="#3f3fff" stop-opacity="0" offset="1"/>
 
       <stop stop-color="#3f3fff" stop-opacity="0" offset="1"/>
 
     </linearGradient>
 
     </linearGradient>
     <linearGradient id="lg1">
+
     ...
      <stop stop-color="#111111" offset="0"/>
 
      <stop stop-color="#111111" stop-opacity="0" offset="1"/>
 
    </linearGradient>
 
....
 
....
 
....
 
72835 98.448425,358.72835 98.448425,357.52606 98.448425,356.32376 98.448425,355.12147 103.25761,355.12147 108.06679,355.12147 112.87597,355.12147 112.87597,356.32376 112.87598,357.52606 112.87598,358.72836 115.28058,358.72836 117.68516,358.72835 120.08975,358.72835 120.08975,363.53754 120.08975,368.34672 120.08975,373.1559 118.88745,373.1559 117.68516,373.1559 116.48286,373.1559 116.48286,369.54902 116.48286,365.94213 116.48286,362.33524 z M 84.020882,358.72836 84.02088,355.12147 94.841547,355.12147 94.841547,358.72836 84.020882,358.72836 z" fill="#030303"/>
 
 
</svg></pre>
 
</svg></pre>
[[Category:XQuery]]
+
 
 +
==POST Request==
 +
 
 +
POST request to the BaseX REST Service, specifying a username and password.
 +
 
 +
'''Query:'''
 +
<pre class="brush:xquery">
 +
let $request :=
 +
  <http:request href='http://localhost:8984/rest'
 +
    method='post' username='admin' password='admin' send-authorization='true'>
 +
    <http:body media-type='application/xml'>
 +
      <query xmlns="http://basex.org/rest">
 +
        <text><![CDATA[
 +
          <html>{
 +
            for $i in 1 to 3
 +
            return <div>Section {$i }</div>
 +
          }</html>
 +
        ]]></text>
 +
      </query>
 +
    </http:body>
 +
  </http:request>
 +
return http:send-request($request)
 +
</pre>
 +
 
 +
'''Result:'''
 +
<pre class="brush:xml">
 +
<http:response xmlns:http="http://expath.org/ns/http-client" status="200" message="OK">
 +
  <http:header name="Content-Length" value="135"/>
 +
  <http:header name="Content-Type" value="application/xml"/>
 +
  <http:header name="Server" value="Jetty(6.1.26)"/>
 +
  <http:body media-type="application/xml"/>
 +
</http:response>
 +
<html>
 +
  <div>Section 1</div>
 +
  <div>Section 2</div>
 +
  <div>Section 3</div>
 +
</html>
 +
</pre>
 +
 
 +
=Errors=
 +
 
 +
{| class="wikitable" width="100%"
 +
! width="110"|Code
 +
|Description
 +
|-
 +
|{{Code|HC0001}}
 +
|An HTTP error occurred.
 +
|-
 +
|{{Code|HC0002}}
 +
|Error parsing the entity content as XML or HTML.
 +
|-
 +
|{{Code|HC0003}}
 +
|With a multipart response, the override-media-type must be either a multipart media type or application/octet-stream.
 +
|-
 +
|{{Code|HC0004}}
 +
|The src attribute on the body element is mutually exclusive with all other attribute (except the media-type).
 +
|-
 +
|{{Code|HC0005}}
 +
|The request element is not valid.
 +
|-
 +
|{{Code|HC0006}}
 +
|A timeout occurred waiting for the response.
 +
|}
 +
 
 +
=Changelog=
 +
 
 +
;Version 9.0
 +
* Updated: support for gzipped content encoding
 +
 
 +
;Version 8.0
 +
* Added: digest authentication
 +
 
 +
;Version 7.6
 +
* Updated: [[#http:send-request|http:send-request]]: {{Code|HC0002}} is raised if the input cannot be parsed or converted to the final data type.
 +
* Updated: errors are using {{Code|text/plain}} as media-type.

Revision as of 18:58, 16 February 2018

This XQuery Module contains a single function to send HTTP requests and handle HTTP responses. The function send-request is based on the EXPath HTTP Client Module. It gives full control over the available request and response parameters. For simple GET requests, the Fetch Module may be sufficient.

With Version 9.0, if <http:header name="Accept-Encoding" value="gzip"/> is specified and if the addressed web server provides support for the gzip compression algorithm, the response will automatically be decompressed.

Conventions

All functions in this module are assigned to the http://expath.org/ns/http-client namespace, which is statically bound to the http prefix.
All errors are assigned to the http://expath.org/ns/error namespace, which is statically bound to the exerr prefix.

Functions

http:send-request

Signatures http:send-request($request as element(http:request)?, $href as xs:string?, $bodies as item()*) as item()+
http:send-request($request as element(http:request)) as item()+
http:send-request($request as element(http:request)?, $href as xs:string?) as item()+
Summary Sends an HTTP request and interprets the corresponding response. $request contains the parameters of the HTTP request such as HTTP method and headers. In addition to this it can also contain the URI to which the request will be sent and the body of the HTTP method. If the URI is not given with the parameter $href, its value in $request is used instead.
The structure of http:request element follows the EXPath specification. Both basic and digest authentication is supported.
Errors HC0001: an HTTP error occurred.
HC0002: error parsing the entity content as XML or HTML.
HC0003: with a multipart response, the override-media-type must be either a multipart media type or application/octet-stream.
HC0004: the src attribute on the body element is mutually exclusive with all other attribute (except the media-type).
HC0005: the request element is not valid.
HC0006: a timeout occurred waiting for the response.

Examples

Status Only

Simple GET request. As the attribute status-only is set to true, only the response element is returned.

Query:

http:send-request(<http:request method='get' status-only='true'/>, 'http://basex.org')

Result:

<http:response status="200" message="OK">
  <http:header name="Date" value="Mon, 14 Mar 2011 20:55:53 GMT"/>
  <http:header name="Content-Length" value="12671"/>
  <http:header name="Expires" value="Mon, 14 Mar 2011 20:57:23 GMT"/>
  <http:header name="Set-Cookie" value="fe_typo_user=d10c9552f9a784d1a73f8b6ebdf5ce63; path=/"/>
  <http:header name="Connection" value="close"/>
  <http:header name="Content-Type" value="text/html; charset=utf-8"/>
  <http:header name="Server" value="Apache/2.2.16"/>
  <http:header name="X-Powered-By" value="PHP/5.3.5"/>
  <http:header name="Cache-Control" value="max-age=90"/>
  <http:body media-type="text/html; charset=utf-8"/>
</http:response>

Google Homepage

Retrieve the Google search home page with a timeout of 10 seconds. In order to parse HTML, TagSoup must be contained in the class path.

Query:

http:send-request(<http:request method='get' href='http://www.google.com' timeout='10'/>)

Result:

<http:response status="200" message="OK">
  <http:header name="Date" value="Mon, 14 Mar 2011 22:03:25 GMT"/>
  <http:header name="Transfer-Encoding" value="chunked"/>
  <http:header name="Expires" value="-1"/>
  <http:header name="X-XSS-Protection" value="1; mode=block"/>
  <http:header name="Set-Cookie" value="...; expires=Tue, 13-Sep-2011 22:03:25 GMT; path=/; domain=.google.ch; HttpOnly"/>
  <http:header name="Content-Type" value="text/html; charset=ISO-8859-1"/>
  <http:header name="Server" value="gws"/>
  <http:header name="Cache-Control" value="private, max-age=0"/>
  <http:body media-type="text/html; charset=ISO-8859-1"/>
</http:response>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>
    <title>Google</title>
    <script>window.google={kEI:"rZB-
    ... 
    </script>
    </center>
  </body>
</html>

The response content type can also be overwritten in order to retrieve HTML pages and other textual data as plain string (using text/plain) or in its binary representation (using application/octet-stream). With the http:header element, a custom user agent can be set. See the following example:

Query:

let $binary :=  http:send-request(
  <http:request method='get'
     override-media-type='application/octet-stream'
     href='http://www.google.com'>
    <http:header name="User-Agent" value="Opera"/>
  </http:request>
)[2]
return try {
  html:parse($binary)
} catch * {
  'Conversion to XML failed: ' || $err:description
}

SVG Data

Content-type ending with +xml, e.g. image/svg+xml.

Query:

http:send-request(<http:request method='get'/>, 'http://upload.wikimedia.org/wikipedia/commons/6/6b/Bitmap_VS_SVG.svg')

Result:

<http:response status="200" message="OK">
  <http:header name="ETag" value="W/"11b6d-4ba15ed4""/>
  <http:header name="Age" value="9260"/>
  <http:header name="Date" value="Mon, 14 Mar 2011 19:17:10 GMT"/>
  <http:header name="Content-Length" value="72557"/>
  <http:header name="Last-Modified" value="Wed, 17 Mar 2010 22:59:32 GMT"/>
  <http:header name="Content-Type" value="image/svg+xml"/>
  <http:header name="X-Cache-Lookup" value="MISS from knsq22.knams.wikimedia.org:80"/>
  <http:header name="Connection" value="keep-alive"/>
  <http:header name="Server" value="Sun-Java-System-Web-Server/7.0"/>
  <http:header name="X-Cache" value="MISS from knsq22.knams.wikimedia.org"/>
  <http:body media-type="image/svg+xml"/>
</http:response>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="1063" height="638">
  <defs>
    <linearGradient id="lg0">
      <stop stop-color="#3333ff" offset="0"/>
      <stop stop-color="#3f3fff" stop-opacity="0" offset="1"/>
    </linearGradient>
    ...
</svg>

POST Request

POST request to the BaseX REST Service, specifying a username and password.

Query:

let $request :=
  <http:request href='http://localhost:8984/rest'
    method='post' username='admin' password='admin' send-authorization='true'>
    <http:body media-type='application/xml'>
      <query xmlns="http://basex.org/rest">
        <text><![CDATA[
          <html>{
            for $i in 1 to 3
            return <div>Section {$i }</div>
          }</html>
        ]]></text>
      </query>
    </http:body>
  </http:request>
return http:send-request($request)

Result:

<http:response xmlns:http="http://expath.org/ns/http-client" status="200" message="OK">
  <http:header name="Content-Length" value="135"/>
  <http:header name="Content-Type" value="application/xml"/>
  <http:header name="Server" value="Jetty(6.1.26)"/>
  <http:body media-type="application/xml"/>
</http:response>
<html>
  <div>Section 1</div>
  <div>Section 2</div>
  <div>Section 3</div>
</html>

Errors

Code Description
HC0001 An HTTP error occurred.
HC0002 Error parsing the entity content as XML or HTML.
HC0003 With a multipart response, the override-media-type must be either a multipart media type or application/octet-stream.
HC0004 The src attribute on the body element is mutually exclusive with all other attribute (except the media-type).
HC0005 The request element is not valid.
HC0006 A timeout occurred waiting for the response.

Changelog

Version 9.0
  • Updated: support for gzipped content encoding
Version 8.0
  • Added: digest authentication
Version 7.6
  • Updated: http:send-request: HC0002 is raised if the input cannot be parsed or converted to the final data type.
  • Updated: errors are using text/plain as media-type.