Main Page » XQuery » Functions » Archive Functions

Archive Functions

This module contains functions to handle archives (including ePub, Open Office, JAR, and many other formats). New ZIP and GZIP archives can be created, existing archives can be updated, and the archive entries can be listed and extracted.

Updated:

  • Input archives can now be addressed via file paths and URIs.
  • Archive entries can be deleted by specifying empty arrays as contents.

Conventions

All functions and errors in this module are assigned to the http://basex.org/modules/archive namespace, which is statically bound to the archive prefix.

Content Handling

archive:entries

Signature
archive:entries(
  $archive  as (xs:string|xs:base64Binary|xs:hexBinary)
) as element(archive:entry)*
SummaryReturns the entry descriptors of the specified $archive (supplied as binary or URI/file path). A descriptor contains the following attributes, provided that they are available in the archive format:
  • size: original file size
  • last-modified: timestamp, formatted as xs:dateTime
  • compressed-size: compressed file size
An example:
<archive:entry size="1840" last-modified="2024-03-20T03:30:32" compressed-size="672">
  doc/index.html
</archive:entry>
Errors
errorProcessing failed.
Examples
sum(archive:entries(file:read-binary('zip.zip'))/@size)
Sums up the file sizes of all entries of a JAR file.

archive:options

Signature
archive:options(
  $archive  as (xs:string|xs:base64Binary|xs:hexBinary)
) as map(*)
SummaryReturns the options of the specified $archive (supplied as binary or URI/file path) in the format specified by archive:create.
Errors
errorProcessing failed.
formatThe archive format or the specified option is invalid or not supported.
Examples
{
  "format": "zip",
  "algorithm": "deflate"
}
Returned for a standard ZIP archive.

archive:extract-text

Signature
archive:extract-text(
  $archive   as (xs:string|xs:base64Binary|xs:hexBinary),
  $entries   as xs:string*                                := (),
  $encoding  as xs:string                                 := ()
) as xs:string*
SummaryExtracts entries of the specified $archive (supplied as binary or URI/file path) and returns them as texts. The returned entries can be limited via $entries. The format of the argument is the same as for archive:create (attributes will be ignored). The encoding of the input files can be specified via $encoding.
Errors
encodeThe specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
errorProcessing failed.
Examples
let $archive := file:read-binary('documents.zip')
for $entry in archive:entries($archive)[ends-with(., '.txt')]
return archive:extract-text($archive, $entry)
Extracts all .txt files from an archive.

archive:extract-binary

Signature
archive:extract-binary(
  $archive  as (xs:string|xs:base64Binary|xs:hexBinary),
  $entries  as xs:string*                                := ()
) as xs:base64Binary*
SummaryExtracts entries of the specified $archive (supplied as binary or URI/file path) and returns them as binaries. The returned entries can be limited via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
Errors
errorProcessing failed.
Examples
let $archive  := file:read-binary('archive.zip')
let $entries  := archive:entries($archive)
let $contents := archive:extract-binary($archive)
return for-each-pair($entries, $contents, fn($entry, $content) {
  file:create-dir(replace($entry, "[^/]+$", "")),
  file:write-binary($entry, $content)
})
Unzips all files of archive.zip to the current directory.

Updates

archive:create

Signature
archive:create(
  $entries   as item(),
  $contents  as item()*,
  $options   as map(*)?  := {}
) as xs:base64Binary
SummaryCreates a new archive from the specified entries and contents.

The $entries argument contains metadata. Its items may be of type xs:string, representing the name of the file, or element(archive:entry), containing the name as string value and additional, optional attributes:

  • last-modified: timestamp, specified as xs:dateTime (default: current time)
  • compression-level: 09, 0 = uncompressed (default: 8)
  • encoding: for textual entries (default: UTF-8)

An entry may look as follows:
<archive:entry last-modified='2011-11-11T11:11:11' compression-level='8' encoding='US-ASCII'>
  hello.txt
</archive:entry>

The $contents must have one of the following types:

  • Items of type xs:string are treated as text.
  • Items of type xs:base64Binary or xs:hexBinary are treated as binaries.
  • In the case of updates (see below), an empty array indicates that an entry is to be deleted.

The following $options are available:

optiondefaultdescription
formatzip Allowed values are zip and gzip.
algorithmdeflate Allowed values are deflate and stored (for the zip format).

Errors
descriptorEntry descriptors contain invalid entry names, timestamps or compression levels.
encodeThe specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
errorProcessing failed.
formatThe archive format or the specified option is invalid or not supported.
numberThe number of specified entries and contents differs.
singleThe chosen archive format only allows single entries.
Examples
archive:create(<archive:entry>file.txt</archive:entry>, 'Hello World')
Creates an archive archive.zip with one file file.txt.
let $path  := 'audio/'
let $files := file:list($path, true(), '*.mp3')
let $zip   := archive:create($files, $files ! file:read-binary($path || .))
return file:write-binary('mp3.zip', $zip)
Creates an archive mp3.zip, which contains all MP3 files of a local directory.

archive:update

Signature
archive:update(
  $archive   as (xs:string|xs:base64Binary|xs:hexBinary),
  $entries   as item()*,
  $contents  as item()*
) as xs:base64Binary
SummaryCreates an updated version of the specified $archive (supplied as binary or URI/file path) with new or replaced entries. The format of $entries and $contents is the same as for archive:create.
Errors
descriptorEntry descriptors contain invalid entry names, timestamps or compression levels.
encodeThe specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
errorProcessing failed.
modifyThe entries of the given archive cannot be modified.
numberThe number of specified entries and contents differs.
Examples
let $archive := archive:update('source.zip', 'delete-me.txt', [])
return file:write-binary('target.zip', $archive)
Removes a file from an archive.
declare variable $input  := "HelloWorld.docx";
declare variable $output := "HelloUniverse.docx";
declare variable $doc    := "word/document.xml";

let $xml := parse-xml(archive:extract-text($input, $doc))
let $updated := $xml update {
  replace value of node .//*[text() = "HELLO WORLD!"] with "HELLO UNIVERSE!"
}
let $archive := archive:update($input, $updated, serialize($updated))
return file:write-binary($output, $archive)
Replaces texts in a Word document.

archive:delete

Signature
archive:delete(
  $archive  as (xs:string|xs:base64Binary|xs:hexBinary),
  $entries  as xs:string*
) as xs:base64Binary
SummaryDeletes entries from an $archive (supplied as binary or URI/file path). The format of $entries is the same as for archive:create.
Errors
errorProcessing failed.
modifyThe entries of the given archive cannot be modified.
Examples
let $zip := file:read-binary('old.zip')
let $entries := archive:entries($zip)[matches(., '\.x?html?$', 'i')]
return file:write-binary('new.zip', archive:delete($zip, $entries))
Deletes all HTML files in an archive and creates a new file.

Convenience Functions

archive:create-from

Signature
archive:create-from(
  $path     as xs:string,
  $options  as map(*)?    := {},
  $entries  as item()*    := ()
) as xs:base64Binary
SummaryThis convenience function creates an archive from all files in the specified directory $path. The $options parameter contains archiving options, and the files to be archived can be limited via $entries. The format of the two last arguments is identical to archive:create, with two additional options:
  • recursive: parse all files recursively (default: true; ignored if entries are specified via the last argument).
  • root-dir: use name of supplied directory as archive root directory (default: false).
Errors
errorProcessing failed.
Examples
let $zip := archive:create-from('/home/user/')
return file:write-binary('archive.zip', $zip)
Writes the files of a user’s home directory to archive.zip.

archive:extract-to

Signature
archive:extract-to(
  $path     as xs:string,
  $archive  as (xs:string|xs:base64Binary|xs:hexBinary),
  $entries  as xs:string*                                := ()
) as empty-sequence()
SummaryThis convenience function writes files of an $archive (supplied as binary or URI/file path) to the specified directory $path. The archive entries to be written can be restricted via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
Errors
errorProcessing failed.
Examples
archive:extract-to('.', 'archive.zip')
Unzips all files of archive.zip to the current directory.

archive:write

Signature
archive:write(
  $path      as xs:string,
  $entries   as item()*,
  $contents  as item()*,
  $options   as map(*)?    := {}
) as empty-sequence()
SummaryThis convenience function creates a new archive from the specified $entries and $contents and writes it to $path. See archive:create for more details.
Errors
descriptorEntry descriptors contain invalid entry names, timestamps or compression levels.
encodeThe specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
errorProcessing failed.
formatThe archive format or the specified option is invalid or not supported.
numberThe number of specified entries and contents differs.
singleThe chosen archive format only allows single entries.
Examples
let $files := file:children('music')[ends-with(., 'mp3')]
return archive:write(
  'music.zip',
  ('info.txt', $files ! file:name(.)),
  ('Archive with MP3 files', $files ! file:read-binary(.))
)
All mp3 files from a directory are zipped and written to a file, along with an info file.

archive:refresh

Added: New function for updating existing archives.

Signature
archive:refresh(
  $path      as xs:string,
  $entries   as item()*,
  $contents  as item()*
) as empty-sequence()
SummaryThis convenience function updates a local ZIP archive located at $path with new or replaced entries:
  • The format of $entries and $contents is the same as for archive:create.
  • If the path points to a remote resource or a GZIP archive, it is rejected.
  • Custom compression levels are ignored.
Errors
descriptorEntry descriptors contain invalid entry names, timestamps or compression levels.
encodeThe specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
errorProcessing failed.
numberThe number of specified entries and contents differs.
zipThe input is expected to be a local ZIP archive.
Examples
archive:refresh(
  'archive.zip',
  ('readme.txt', 'changelog.txt'),
  ('Read this info carefully: ...', 'These are the latest changes: ...')
)
The two files readme.txt and changelog.txt are added to (or updated in) archive.zip.
archive:refresh('archive.zip', 'delete-me.txt', [])
The file delete-me.txt is removed from archive.zip.

Errors

CodeDescription
descriptorEntry descriptors contain invalid entry names, timestamps or compression levels.
encodeThe specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
errorProcessing failed.
formatThe archive format or the specified option is invalid or not supported.
modifyThe entries of the given archive cannot be modified.
numberThe number of specified entries and contents differs.
singleThe chosen archive format only allows single entries.
zipThe input is expected to be a local ZIP archive.

Changelog

Version 11.0
  • Added: archive:refresh: New function for updating existing archives.
  • Updated: Input archives can be addressed via file paths and URIs.
  • Updated: Archive entries can be deleted by specifying empty arrays as contents.
Version 9.6Version 9.0
  • Updated: archive:create-from: options added
  • Updated: error codes updated; errors now use the module namespace
Version 8.5Version 8.3Version 7.3
  • Added: New module added.

⚡Generated with XQuery