Difference between revisions of "Archive Module"

From BaseX Documentation
Jump to navigation Jump to search
(8 intermediate revisions by the same user not shown)
Line 20: Line 20:
 
* {{Code|encoding}}: for textual entries (default: UTF-8)
 
* {{Code|encoding}}: for textual entries (default: UTF-8)
 
An example:
 
An example:
<pre class="brush:xml">
+
<syntaxhighlight lang="xml">
 
<archive:entry last-modified='2011-11-11T11:11:11'
 
<archive:entry last-modified='2011-11-11T11:11:11'
 
               compression-level='8'
 
               compression-level='8'
 
               encoding='US-ASCII'>hello.txt</archive:entry>
 
               encoding='US-ASCII'>hello.txt</archive:entry>
</pre>
+
</syntaxhighlight>
 
The actual {{Code|$contents}} must be {{Code|xs:string}} or {{Code|xs:base64Binary}} items.<br/>
 
The actual {{Code|$contents}} must be {{Code|xs:string}} or {{Code|xs:base64Binary}} items.<br/>
 
The {{Code|$options}} parameter contains archiving options:
 
The {{Code|$options}} parameter contains archiving options:
Line 31: Line 31:
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
|{{Error|number|#Errors}} the number of entries and contents differs.<br />{{Error|format|#Errors}} the specified option or its value is invalid or not supported.<br />{{Error|descriptor|#Errors}} entry descriptors contain invalid entry names, timestamps or compression levels.<br/>{{Error|encode|#Errors}} the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if the <code>[[Options#CHECKSTRINGS|CHECKSTRINGS]]</code> option is turned off.<br/>{{Error|single|#Errors}} the chosen archive format only allows single entries.<br />{{Error|error|#Errors}} archive creation failed for some other reason.
+
|{{Error|number|#Errors}} the number of entries and contents differs.<br />{{Error|format|#Errors}} the specified option or its value is invalid or not supported.<br />{{Error|descriptor|#Errors}} entry descriptors contain invalid entry names, timestamps or compression levels.<br/>{{Error|encode|#Errors}} the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if {{Option|CHECKSTRINGS}} is turned off.<br/>{{Error|single|#Errors}} the chosen archive format only allows single entries.<br />{{Error|error|#Errors}} archive creation failed for some other reason.
 
|-
 
|-
 
| '''Examples'''
 
| '''Examples'''
 
|The following one-liner creates an archive {{Code|archive.zip}} with one file {{Code|file.txt}}:
 
|The following one-liner creates an archive {{Code|archive.zip}} with one file {{Code|file.txt}}:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
archive:create(<archive:entry>file.txt</archive:entry>, 'Hello World')
 
archive:create(<archive:entry>file.txt</archive:entry>, 'Hello World')
</pre>
+
</syntaxhighlight>
 
The following function creates an archive {{Code|mp3.zip}}, which contains all MP3 files of a local directory:
 
The following function creates an archive {{Code|mp3.zip}}, which contains all MP3 files of a local directory:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
let $path  := 'audio/'
 
let $path  := 'audio/'
 
let $files := file:list($path, true(), '*.mp3')
 
let $files := file:list($path, true(), '*.mp3')
let $zip  := archive:create(
+
let $zip  := archive:create($files,
  $files ! element archive:entry { . },
+
   for $file in $files
   $files ! file:read-binary($path || .))
+
  return file:read-binary($path || $file)
return file:write-binary('mp3.zip', $zip)</pre>
+
)
 +
return file:write-binary('mp3.zip', $zip)</syntaxhighlight>
 
|}
 
|}
  
 
==archive:create-from==
 
==archive:create-from==
 
{{Mark|Updated with Version 9.2}}: options added.
 
  
 
{| width='100%'
 
{| width='100%'
Line 67: Line 66:
 
| '''Examples'''
 
| '''Examples'''
 
|This example writes the files of a user’s home directory to <code>archive.zip</code>:
 
|This example writes the files of a user’s home directory to <code>archive.zip</code>:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
let $zip := archive:create-from('/home/user/')
 
let $zip := archive:create-from('/home/user/')
 
return file:write-binary('archive.zip', $zip)
 
return file:write-binary('archive.zip', $zip)
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 86: Line 85:
 
* {{Code|compressed-size}}: compressed file size
 
* {{Code|compressed-size}}: compressed file size
 
An example:
 
An example:
<pre class="brush:xml">
+
<syntaxhighlight lang="xml">
 
<archive:entry size="1840" last-modified="2009-03-20T03:30:32" compressed-size="672">
 
<archive:entry size="1840" last-modified="2009-03-20T03:30:32" compressed-size="672">
 
   doc/index.html
 
   doc/index.html
 
</archive:entry>
 
</archive:entry>
</pre>
+
</syntaxhighlight>
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
Line 97: Line 96:
 
|'''Examples'''
 
|'''Examples'''
 
|Sums up the file sizes of all entries of a JAR file:
 
|Sums up the file sizes of all entries of a JAR file:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
sum(archive:entries(file:read-binary('zip.zip'))/@size)
 
sum(archive:entries(file:read-binary('zip.zip'))/@size)
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 117: Line 116:
 
| '''Examples'''
 
| '''Examples'''
 
|A standard ZIP archive will return the following options:
 
|A standard ZIP archive will return the following options:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
map {
 
map {
 
   "format": "zip",
 
   "format": "zip",
 
   "algorithm": "deflate"
 
   "algorithm": "deflate"
 
}
 
}
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 136: Line 135:
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
|{{Error|encode|#Errors}} the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if the <code>[[Options#CHECKSTRINGS|CHECKSTRINGS]]</code> option is turned off.<br />{{Error|error|#Errors}} archive creation failed for some other reason.
+
|{{Error|encode|#Errors}} the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if {{Option|CHECKSTRINGS}} is turned off.<br />{{Error|error|#Errors}} archive creation failed for some other reason.
 
|-
 
|-
 
| '''Examples'''
 
| '''Examples'''
 
|The following expression extracts all {{Code|.txt}} files from an archive:
 
|The following expression extracts all {{Code|.txt}} files from an archive:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
let $archive := file:read-binary("documents.zip")
 
let $archive := file:read-binary("documents.zip")
 
for $entry in archive:entries($archive)[ends-with(., '.txt')]
 
for $entry in archive:entries($archive)[ends-with(., '.txt')]
 
return archive:extract-text($archive, $entry)
 
return archive:extract-text($archive, $entry)
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 162: Line 161:
 
| '''Examples'''
 
| '''Examples'''
 
|This example unzips all files of an archive to the current directory:
 
|This example unzips all files of an archive to the current directory:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
let $archive  := file:read-binary('archive.zip')
 
let $archive  := file:read-binary('archive.zip')
 
let $entries  := archive:entries($archive)
 
let $entries  := archive:entries($archive)
Line 169: Line 168:
 
   file:create-dir(replace($entry, "[^/]+$", "")),
 
   file:create-dir(replace($entry, "[^/]+$", "")),
 
   file:write-binary($entry, $content)
 
   file:write-binary($entry, $content)
})</pre>
+
})</syntaxhighlight>
 
|}
 
|}
  
Line 187: Line 186:
 
| '''Examples'''
 
| '''Examples'''
 
|The following expression unzips all files of an archive to the current directory:
 
|The following expression unzips all files of an archive to the current directory:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
archive:extract-to('.', file:read-binary('archive.zip'))
 
archive:extract-to('.', file:read-binary('archive.zip'))
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 203: Line 202:
 
|-
 
|-
 
| '''Errors'''
 
| '''Errors'''
|{{Error|number|#Errors}} the number of entries and contents differs.<br />{{Error|descriptor|#Errors}} entry descriptors contain invalid entry names, timestamps, compression levels or encodings.<br/>{{Error|encode|#Errors}} the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if the <code>[[Options#CHECKSTRINGS|CHECKSTRINGS]]</code> option is turned off.<br />{{Error|modify|#Errors}} the entries of the given archive cannot be modified.<br/>{{Error|error|#Errors}} archive creation failed for some other reason.
+
|{{Error|number|#Errors}} the number of entries and contents differs.<br />{{Error|descriptor|#Errors}} entry descriptors contain invalid entry names, timestamps, compression levels or encodings.<br/>{{Error|encode|#Errors}} the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if {{Option|CHECKSTRINGS}} is turned off.<br />{{Error|modify|#Errors}} the entries of the given archive cannot be modified.<br/>{{Error|error|#Errors}} archive creation failed for some other reason.
 
|-
 
|-
 
| '''Examples'''
 
| '''Examples'''
 
|This example replaces texts in a Word document:
 
|This example replaces texts in a Word document:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
declare variable $input  := "HelloWorld.docx";
 
declare variable $input  := "HelloWorld.docx";
 
declare variable $output := "HelloUniverse.docx";
 
declare variable $output := "HelloUniverse.docx";
Line 219: Line 218:
 
let $updated := archive:update($archive, $doc, $entry)
 
let $updated := archive:update($archive, $doc, $entry)
 
return file:write-binary($output, $updated)
 
return file:write-binary($output, $updated)
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 237: Line 236:
 
| '''Examples'''
 
| '''Examples'''
 
|This example deletes all HTML files in an archive and creates a new file:
 
|This example deletes all HTML files in an archive and creates a new file:
<pre class="brush:xquery">
+
<syntaxhighlight lang="xquery">
 
let $zip := file:read-binary('old.zip')
 
let $zip := file:read-binary('old.zip')
 
let $entries := archive:entries($zip)[matches(., '\.x?html?$', 'i')]
 
let $entries := archive:entries($zip)[matches(., '\.x?html?$', 'i')]
 
return file:write-binary('new.zip', archive:delete($zip, $entries))
 
return file:write-binary('new.zip', archive:delete($zip, $entries))
</pre>
+
</syntaxhighlight>
 
|}
 
|}
  
Line 254: Line 253:
 
|-
 
|-
 
|{{Code|encode}}
 
|{{Code|encode}}
|The specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if the <code>[[Options#CHECKSTRINGS|CHECKSTRINGS]]</code> option is turned off.
+
|The specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if {{Option|CHECKSTRINGS}} is turned off.
 
|-
 
|-
 
|{{Code|error}}
 
|{{Code|error}}
Line 273: Line 272:
  
 
=Changelog=
 
=Changelog=
 +
 +
;Version 9.0
 +
 +
* Updated: [[#archive:create-from|archive:create-from]]: options added
  
 
;Version 9.0
 
;Version 9.0

Revision as of 13:53, 27 February 2020

This XQuery Module contains functions to handle archives (including ePub, Open Office, JAR, and many other formats). New ZIP and GZIP archives can be created, existing archives can be updated, and the archive entries can be listed and extracted. The archive:extract-binary function includes an example for writing the contents of an archive to disk.

Conventions

All functions and errors in this module are assigned to the http://basex.org/modules/archive namespace, which is statically bound to the archive prefix.

Functions

archive:create

Signatures archive:create($entries as item(), $contents as item()*) as xs:base64Binary
archive:create($entries as item(), $contents as item()*, $options as map(*)?) as xs:base64Binary
Summary Creates a new archive from the specified entries and contents.
The $entries argument contains meta information required to create new entries. All items may either be of type xs:string, representing the entry name, or element(archive:entry), containing the name as text node and additional, optional attributes:
  • last-modified: timestamp, specified as xs:dateTime (default: current time)
  • compression-level: 0-9, 0 = uncompressed (default: 8)
  • encoding: for textual entries (default: UTF-8)

An example: <syntaxhighlight lang="xml"> <archive:entry last-modified='2011-11-11T11:11:11'

              compression-level='8'
              encoding='US-ASCII'>hello.txt</archive:entry>

</syntaxhighlight> The actual $contents must be xs:string or xs:base64Binary items.
The $options parameter contains archiving options:

  • format: allowed values are zip and gzip. zip is the default.
  • algorithm: allowed values are deflate and stored (for the zip format). deflate is the default.
Errors number: the number of entries and contents differs.
format: the specified option or its value is invalid or not supported.
descriptor: entry descriptors contain invalid entry names, timestamps or compression levels.
encode: the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
single: the chosen archive format only allows single entries.
error: archive creation failed for some other reason.
Examples The following one-liner creates an archive archive.zip with one file file.txt:

<syntaxhighlight lang="xquery"> archive:create(<archive:entry>file.txt</archive:entry>, 'Hello World') </syntaxhighlight> The following function creates an archive mp3.zip, which contains all MP3 files of a local directory: <syntaxhighlight lang="xquery"> let $path := 'audio/' let $files := file:list($path, true(), '*.mp3') let $zip := archive:create($files,

 for $file in $files
 return file:read-binary($path || $file)

) return file:write-binary('mp3.zip', $zip)</syntaxhighlight>

archive:create-from

Signatures archive:create-from($path as xs:string) as xs:base64Binary
archive:create-from($path as xs:string, $options as map(*)?) as xs:base64Binary
archive:create-from($path as xs:string, $options as map(*)?, $entries as item()*) as xs:base64Binary
Summary This convenience function creates an archive from all files in the specified directory $path.
The $options parameter contains archiving options, and the files to be archived can be limited via $entries. The format of the two last arguments is identical to archive:create, but two additional options are available:
  • recursive: parse all files recursively (default: true; ignored if entries are specified via the last argument).
  • root-dir: use name of supplied directory as archive root directory (default: false).
Errors file:no-dir: the specified path does not point to a directory.
file:is-dir: one of the specified entries points to a directory.
file:not-found: a specified entry does not exist.
error: archive creation failed for some other reason.
Examples This example writes the files of a user’s home directory to archive.zip:

<syntaxhighlight lang="xquery"> let $zip := archive:create-from('/home/user/') return file:write-binary('archive.zip', $zip) </syntaxhighlight>

archive:entries

Signatures archive:entries($archive as xs:base64Binary) as element(archive:entry)*
Summary Returns the entry descriptors of the specified $archive. A descriptor contains the following attributes, provided that they are available in the archive format:
  • size: original file size
  • last-modified: timestamp, formatted as xs:dateTime
  • compressed-size: compressed file size

An example: <syntaxhighlight lang="xml"> <archive:entry size="1840" last-modified="2009-03-20T03:30:32" compressed-size="672">

 doc/index.html

</archive:entry> </syntaxhighlight>

Errors error: archive creation failed for some other reason.
Examples Sums up the file sizes of all entries of a JAR file:

<syntaxhighlight lang="xquery"> sum(archive:entries(file:read-binary('zip.zip'))/@size) </syntaxhighlight>

archive:options

Signatures archive:options($archive as xs:base64Binary) as map(*)
Summary Returns the options of the specified $archive in the format specified by archive:create.
Errors format: The packing format is not supported.
error: archive creation failed for some other reason.
Examples A standard ZIP archive will return the following options:

<syntaxhighlight lang="xquery"> map {

 "format": "zip",
 "algorithm": "deflate"

} </syntaxhighlight>

archive:extract-text

Signatures archive:extract-text($archive as xs:base64Binary) as xs:string*
archive:extract-text($archive as xs:base64Binary, $entries as item()*) as xs:string*
archive:extract-text($archive as xs:base64Binary, $entries as item()*, $encoding as xs:string) as xs:string*
Summary Extracts entries of the specified $archive and returns them as texts.
The returned entries can be limited via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
The encoding of the input files can be specified via $encoding.
Errors encode: the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
error: archive creation failed for some other reason.
Examples The following expression extracts all .txt files from an archive:

<syntaxhighlight lang="xquery"> let $archive := file:read-binary("documents.zip") for $entry in archive:entries($archive)[ends-with(., '.txt')] return archive:extract-text($archive, $entry) </syntaxhighlight>

archive:extract-binary

Signatures archive:extract-binary($archive as xs:base64Binary) as xs:base64Binary*
archive:extract-binary($archive as xs:base64Binary, $entries as item()*) as xs:base64Binary*
Summary Extracts entries of the specified $archive and returns them as binaries.
The returned entries can be limited via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
Errors error: archive creation failed for some other reason.
Examples This example unzips all files of an archive to the current directory:

<syntaxhighlight lang="xquery"> let $archive := file:read-binary('archive.zip') let $entries := archive:entries($archive) let $contents := archive:extract-binary($archive) return for-each-pair($entries, $contents, function($entry, $content) {

 file:create-dir(replace($entry, "[^/]+$", "")),
 file:write-binary($entry, $content)

})</syntaxhighlight>

archive:extract-to

Signatures archive:extract-to($path as xs:string, $archive as xs:base64Binary) as empty-sequence()
archive:extract-to($path as xs:string, $archive as xs:base64Binary, $entries as item()*) as empty-sequence()
Summary This convenience function writes files of an $archive directly to the specified directory $path.
The archive entries to be written can be restricted via $entries. The format of the argument is the same as for archive:create (attributes will be ignored).
Errors error: archive creation failed for some other reason.
Examples The following expression unzips all files of an archive to the current directory:

<syntaxhighlight lang="xquery"> archive:extract-to('.', file:read-binary('archive.zip')) </syntaxhighlight>

archive:update

Signatures archive:update($archive as xs:base64Binary, $entries as item()*, $contents as item()*) as xs:base64Binary
Summary Creates an updated version of the specified $archive with new or replaced entries.
The format of $entries and $contents is the same as for archive:create.
Errors number: the number of entries and contents differs.
descriptor: entry descriptors contain invalid entry names, timestamps, compression levels or encodings.
encode: the specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
modify: the entries of the given archive cannot be modified.
error: archive creation failed for some other reason.
Examples This example replaces texts in a Word document:

<syntaxhighlight lang="xquery"> declare variable $input := "HelloWorld.docx"; declare variable $output := "HelloUniverse.docx"; declare variable $doc := "word/document.xml";

let $archive := file:read-binary($input) let $entry :=

 copy $c := fn:parse-xml(archive:extract-text($archive, $doc))
 modify replace value of node $c//*[text() = "HELLO WORLD!"] with "HELLO UNIVERSE!"
 return fn:serialize($c)

let $updated := archive:update($archive, $doc, $entry) return file:write-binary($output, $updated) </syntaxhighlight>

archive:delete

Signatures archive:delete($archive as xs:base64Binary, $entries as item()*) as xs:base64Binary
Summary Deletes entries from an $archive.
The format of $entries is the same as for archive:create.
Errors modify: the entries of the given archive cannot be modified.
error: archive creation failed for some other reason.
Examples This example deletes all HTML files in an archive and creates a new file:

<syntaxhighlight lang="xquery"> let $zip := file:read-binary('old.zip') let $entries := archive:entries($zip)[matches(., '\.x?html?$', 'i')] return file:write-binary('new.zip', archive:delete($zip, $entries)) </syntaxhighlight>

Errors

Code Description
descriptor Entry descriptors contain invalid entry names, timestamps or compression levels.
encode The specified encoding is invalid or not supported, or the string conversion failed. Invalid XML characters will be ignored if CHECKSTRINGS is turned off.
error Archive processing failed for some other reason.
format The packing format or the specified option is invalid or not supported.
modify The entries of the given archive cannot be modified.
number The number of specified entries and contents differs.
single The chosen archive format only allows single entries.

Changelog

Version 9.0
Version 9.0
  • Updated: error codes updated; errors now use the module namespace
Version 8.5
Version 8.3
Version 7.7

The module was introduced with Version 7.3.