Difference between revisions of "Archive Module"

From BaseX Documentation
Jump to navigation Jump to search
Line 1: Line 1:
This [[Module Library|XQuery Module]] contains functions to handle ZIP archives. New ZIP archives can be created, existing archives can be updated, and the archive entries can be listed and extracted. This module may soon replace the existing [[ZIP Module]] ([http://spex.basex.org/index.php?title=ZIP_Module more information]).
+
This [[Module Library|XQuery Module]] contains functions to handle archives. New ZIP and GZIP archives can be created, existing archives can be updated, and the archive entries can be listed and extracted. This module may soon replace the existing [[ZIP Module]] ([http://spex.basex.org/index.php?title=ZIP_Module more information]).
  
 
=Conventions=
 
=Conventions=
Line 15: Line 15:
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
|Creates a new ZIP archive from the specified entries and contents.<br/>The {{Code|$entries}} argument contains meta information required to create new ZIP entries. All items may either be of type {{Code|xs:string}}, representing the entry name, or {{Code|element(archive:entry)}}, containing the name as text node and additional, optional attributes:
+
|Creates a new archive from the specified entries and contents.<br/>The {{Code|$entries}} argument contains meta information required to create new entries. All items may either be of type {{Code|xs:string}}, representing the entry name, or {{Code|element(archive:entry)}}, containing the name as text node and additional, optional attributes:
 
* {{Code|last-modified}}: timestamp, specified as xs:dateTime (default: current time)
 
* {{Code|last-modified}}: timestamp, specified as xs:dateTime (default: current time)
 
* {{Code|compression-level}}: 0-9, 0 = uncompressed (default: 8)
 
* {{Code|compression-level}}: 0-9, 0 = uncompressed (default: 8)
Line 65: Line 65:
 
|-
 
|-
 
| width='90' | '''Signatures'''
 
| width='90' | '''Signatures'''
|{{Func|archive:entries|$zip as xs:base64Binary|element(archive:entry)*}}<br />
+
|{{Func|archive:entries|$archive as xs:base64Binary|element(archive:entry)*}}<br />
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
|Returns the entry descriptors of the given zip archive. A descriptor contains the following attributes, provided that they are available in the archive format:
+
|Returns the entry descriptors of the given archive. A descriptor contains the following attributes, provided that they are available in the archive format:
 
* {{Code|size}}: original file size
 
* {{Code|size}}: original file size
 
* {{Code|last-modified}}: timestamp, formatted as xs:dateTime
 
* {{Code|last-modified}}: timestamp, formatted as xs:dateTime
Line 94: Line 94:
 
|-
 
|-
 
| width='90' | '''Signatures'''
 
| width='90' | '''Signatures'''
|{{Func|archive:options|$zip as xs:base64Binary|element(archive:options)}}<br />
+
|{{Func|archive:options|$archive as xs:base64Binary|element(archive:options)}}<br />
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
|Returns the options of the given zip archive in the format specified by [[#archive:create|archive:create]].
+
|Returns the options of the given archive in the format specified by [[#archive:create|archive:create]].
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
Line 117: Line 117:
 
|-
 
|-
 
| width='90' | '''Signatures'''
 
| width='90' | '''Signatures'''
|{{Func|archive:extract-text|$zip as xs:base64Binary|xs:string*}}<br/>{{Func|archive:extract-text|$zip as xs:base64Binary, $entries as item()*|xs:string*}}<br/>{{Func|archive:extract-text|$zip as xs:base64Binary, $entries as item()*, $encoding as xs:string|xs:string*}}<br/>
+
|{{Func|archive:extract-text|$archive as xs:base64Binary|xs:string*}}<br/>{{Func|archive:extract-text|$archive as xs:base64Binary, $entries as item()*|xs:string*}}<br/>{{Func|archive:extract-text|$archive as xs:base64Binary, $entries as item()*, $encoding as xs:string|xs:string*}}<br/>
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
Line 139: Line 139:
 
|-
 
|-
 
| width='90' | '''Signatures'''
 
| width='90' | '''Signatures'''
|{{Func|archive:extract-binary|$zip as xs:base64Binary|xs:string*}}<br/>{{Func|archive:extract-binary|$zip as xs:base64Binary, $entries as item()*|xs:base64Binary*}}
+
|{{Func|archive:extract-binary|$archive as xs:base64Binary|xs:string*}}<br/>{{Func|archive:extract-binary|$archive as xs:base64Binary, $entries as item()*|xs:base64Binary*}}
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
Line 163: Line 163:
 
|-
 
|-
 
| width='90' | '''Signatures'''
 
| width='90' | '''Signatures'''
|{{Func|archive:update|$zip as xs:base64Binary, $entries as item()*, $contents as item()*|xs:base64Binary}}
+
|{{Func|archive:update|$archive as xs:base64Binary, $entries as item()*, $contents as item()*|xs:base64Binary}}
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
|Adds new entries and replaces existing entries in a zip archive.<br/>The format of {{Code|$entries}} and {{Code|$contents}} is the same as for [[#archive:create|archive:create]].
+
|Adds new entries and replaces existing entries in an archive.<br/>The format of {{Code|$entries}} and {{Code|$contents}} is the same as for [[#archive:create|archive:create]].
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
Line 193: Line 193:
 
|-
 
|-
 
| width='90' | '''Signatures'''
 
| width='90' | '''Signatures'''
|{{Func|archive:delete|$zip as xs:base64Binary, $entries as item()*|xs:base64Binary}}
+
|{{Func|archive:delete|$archive as xs:base64Binary, $entries as item()*|xs:base64Binary}}
 
|-
 
|-
 
| '''Summary'''
 
| '''Summary'''
|Deletes entries from a zip archive.<br/>The format of {{Code|$entries}} is the same as for [[#archive:create|archive:create]].
+
|Deletes entries from an archive.<br/>The format of {{Code|$entries}} is the same as for [[#archive:create|archive:create]].
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
Line 235: Line 235:
 
|-
 
|-
 
|{{Code|ARCH9999}}
 
|{{Code|ARCH9999}}
|ZIP processing failed for some other reason.
+
|Archive processing failed for some other reason.
 
|}
 
|}
  

Revision as of 11:57, 26 June 2012

This XQuery Module contains functions to handle archives. New ZIP and GZIP archives can be created, existing archives can be updated, and the archive entries can be listed and extracted. This module may soon replace the existing ZIP Module (more information).

Conventions

All functions in this module are assigned to the http://basex.org/modules/archive namespace, which is statically bound to the archive prefix.
All errors are assigned to the http://basex.org/errors namespace, which is statically bound to the bxerr prefix.

Functions

archive:create

Signatures archive:create($entries as item(), $contents as item()*) as xs:base64Binary
archive:create($entries as item(), $contents as item()*, $options as item()) as xs:base64Binary
Summary Creates a new archive from the specified entries and contents.
The $entries argument contains meta information required to create new entries. All items may either be of type xs:string, representing the entry name, or element(archive:entry), containing the name as text node and additional, optional attributes:
  • last-modified: timestamp, specified as xs:dateTime (default: current time)
  • compression-level: 0-9, 0 = uncompressed (default: 8)
  • encoding: for textual entries (default: UTF-8)

An example:

<archive:entry last-modified='2011-11-11T11:11:11'
               compression-level='9'
               encoding='US-ASCII'>hello.txt</entry>

The actual $contents must be xs:string or xs:base64Binary items.
The $options parameter contains archiving options, which can either be specified

  • as children of an <archive:options/> element:
<archive:options>
  <archive:format value="zip"/>
  <archive:algorithm value="deflate"/>
</archive:options>
  • as map, which contains all key/value pairs:
map { "format" := "zip", "algorithm" := "deflate" }

Currently, the following combinations are supported (all others will be rejected):

  • zip: algorithm may be stored or deflate
  • gzip: algorithm may be deflate
Errors ARCH0001: the number of entries and contents differs.
ARCH0002: the specified option or its value is invalid or not supported.
ARCH0003: entry descriptors contain invalid entry names, timestamps or compression levels.
ARCH0004: the specified encoding is invalid or not supported, or the string conversion failed.
ARCH0005: the chosen archive format only allows single entries.ARCH9999: archive creation failed for some other reason.
FORG0006: an argument has a wrong type.
Examples The following one-liner creates an archive archive.zip with one file file.txt:
archive:create(<archive:entry>file.txt</archive:entry>, 'Hello World')

The following function creates an archive mp3.zip, which contains all MP3 files of a local directory:

let $path  := 'audio/'
let $files := file:list($path, true(), '*.mp3')
let $zip   := archive:create(
  $files ! element archive:entry { . },
  $files ! file:read-binary($path || .))
return file:write-binary('mp3.zip', $zip)

archive:entries

Signatures archive:entries($archive as xs:base64Binary) as element(archive:entry)*
Summary Returns the entry descriptors of the given archive. A descriptor contains the following attributes, provided that they are available in the archive format:
  • size: original file size
  • last-modified: timestamp, formatted as xs:dateTime
  • compressed-size: compressed file size

An example:

<archive:entry size="1840" last-modified="2009-03-20T03:30:32" compressed-size="672">
  doc/index.html
</archive:entry>
Errors ARCH9999: archive creation failed for some other reason.
Examples Sums up the file sizes of all entries of a JAR file:
sum(archive:entries(file:read-binary('zip.zip'))/@size)

archive:options

Signatures archive:options($archive as xs:base64Binary) as element(archive:options)
Summary Returns the options of the given archive in the format specified by archive:create.
Errors ARCH0002: The packing format is not supported.
ARCH9999: archive creation failed for some other reason.
Examples A standard ZIP archive will return the following options:
<archive:options xmlns:archive="http://basex.org/modules/archive">
  <archive:format value="zip"/>
  <archive:algorithm value="deflate"/>
</archive:options>

archive:extract-text

Signatures archive:extract-text($archive as xs:base64Binary) as xs:string*
archive:extract-text($archive as xs:base64Binary, $entries as item()*) as xs:string*
archive:extract-text($archive as xs:base64Binary, $entries as item()*, $encoding as xs:string) as xs:string*
Summary Extracts archive entries and returns them as texts.
The returned entries can be limited via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
The encoding of the input files can be specified via $encoding.
Errors ARCH0004: the specified encoding is invalid or not supported, or the string conversion failed.
ARCH9999: archive creation failed for some other reason.
Examples The following expression extracts all .txt files from an archive:
let $archive := file:read-binary("documents.zip")
for $entry in archive:entries($archive)[ends-with(., '.txt')]
return archive:extract-text($archive, $entry)

archive:extract-binary

Signatures archive:extract-binary($archive as xs:base64Binary) as xs:string*
archive:extract-binary($archive as xs:base64Binary, $entries as item()*) as xs:base64Binary*
Summary Extracts archive entries and returns them as binaries.
The returned entries can be limited via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
Errors ARCH9999: archive creation failed for some other reason.
Examples This example unzips all files of an archive to the current directory:
let $archive  := file:read-binary('archive.zip')
let $entries  := archive:entries($archive)
let $contents := archive:extract-binary($archive)
for $entry at $p in $entries
return file:write-binary($entry, $contents[$p])

archive:update

Signatures archive:update($archive as xs:base64Binary, $entries as item()*, $contents as item()*) as xs:base64Binary
Summary Adds new entries and replaces existing entries in an archive.
The format of $entries and $contents is the same as for archive:create.
Errors ARCH0001: the number of entries and contents differs.
ARCH0003: entry descriptors contain invalid entry names, timestamps, compression levels or encodings.
ARCH0004: the specified encoding is invalid or not supported, or the string conversion failed.
ARCH0005: the entries of the given archive cannot be modified.
ARCH9999: archive creation failed for some other reason.
FORG0006: (some of) the contents are not of type xs:string or xs:base64Binary.
Examples This example replaces texts in a Word document:
declare variable $input  := "HelloWorld.docx";
declare variable $output := "HelloUniverse.docx";
declare variable $doc    := "word/document.xml";
 
let $archive := file:read-binary($input)
let $entry   :=
  copy $c := fn:parse-xml(archive:extract-text($archive, $doc))
  modify replace value of node $c//*[text() = "HELLO WORLD!"] with "HELLO UNIVERSE!"
  return fn:serialize($c)
let $updated := archive:update($archive, $doc, $entry)
return file:write-binary($output, $updated)

archive:delete

Signatures archive:delete($archive as xs:base64Binary, $entries as item()*) as xs:base64Binary
Summary Deletes entries from an archive.
The format of $entries is the same as for archive:create.
Errors ARCH0005: the entries of the given archive cannot be modified.
ARCH9999: archive creation failed for some other reason.
Examples This example deletes all HTML files in an archive and creates a new file:
let $zip := file:read-binary('old.zip')
let $entries := archive:entries($zip)[matches(., '\.x?html?$', 'i')]
return file:write-binary('new.zip', archive:delete($zip, $entries))

Errors

Code Description
ARCH0001 The number of specified entries and contents differs.
ARCH0002 The packing format or the specified option is invalid or not supported.
ARCH0003 Entry descriptors contain invalid entry names, timestamps or compression levels.
ARCH0004 The specified encoding is invalid or not supported, or the string conversion failed.
ARCH0005 The entries of the given archive cannot be modified.
ARCH0006 The chosen archive format only allows single entries.
ARCH9999 Archive processing failed for some other reason.

Changelog

The module was introduced with Version 7.3.