Changes

Jump to navigation Jump to search
3,855 bytes added ,  13:16, 2 July 2020
==Overview==XML documents often rely on Document Type Definitions This article is part of the [[Advanced User's Guide]]. It clarifies how to deal with mapping system IDs (DTDlocations). While and URIs to local resources when parsing a document with BaseX elements and entities can be checked for validity with respect to that particular DTDtransforming XML data.
Currently the DTD is used only for entity resolution.==Introduction==
XML documents often rely on Document Type Definitions (DTDs). Entities can be resolved with respect to that particular DTD. By default, the DTD is only used for entity resolution.
XHTML , for example , defines its doctype via the following line: <pre classsyntaxhighlight lang="brush:xml">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
</presyntaxhighlightFetching <code>xhtml1-strict.dtd</code> from the W3C’s server obviously involves network traffic. When dealing with single files, this may seem tolerable, but importing large collections benefits from caching these resources. Depending on the remote server, you will experience significant speed improvements when caching DTDs locally. To address these issues, the [https://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html XML Catalogs Standard] defines an entity catalog that maps both external identifiers and arbitrary URI references to URI references. Another application for XML catalogs is to provide local resources for reusable XSLT stylesheet libraries that are imported from a canonical location. This is described in greater detail in the following section. ==Usage== ===System ID (DTD Location) Rewrites===
Fetching BaseX relies on the Apache-maintained [https://xml.apache.org/commons XML Commons Resolver]. The {{Code|xml-resolver-1.2.jar}} library is included in the full distributions of BaseX. If the resolver is not found in the classpath, and if Java 8 is used, Java’s built-in resolver will be applied (via <code>xhtml1-strictcom.sun.org.apache.xml.internal.resolver.dtd*</code> obviously involves network traffic. When dealing with single files this may seem tolerable, but importing large collections might benefit from caching these resources locally).
Depending on your connection To enable entity resolving you will experience significant speed improvementshave to provide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs.
== XML Entity and URI Resolvers in BaseX ==BaseX comes with a default URI resolver that is usable out of the box.A simple working example for XHTML might look like this:
To enable entity resolving you have to provide a valid XML Catalog file, so the parser knows where to look for mirrored DTDs.A simple working example for XHTML might look like this:<pre classsyntaxhighlight lang="brush:xml" start="0"><?xml version="1.0"?>
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///path/to/dtds/" />
</catalog>
</presyntaxhighlight> This rewrites all SystemIds systemIds starting with: ''<code><nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki>'' </code> to ''<code>file:///path/to/dtds/</code>. For example, if the following XML file is parsed: <syntaxhighlight lang="xml"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"/></syntaxhighlight> The XHTML DTD <code>xhtml1-transitional.dtd</code> and all its linked resources will now be loaded from the specified path. The catalog file ''etc/w3-catalog.xml''in the full distributions can be used out of the box. It defines rewriting for some common W3 DTD files. ===URI Rewrites=== Consider a library of reusable XSLT stylesheets. For performance reasons, this library will be cached locally. However, the import URI for a given stylesheet should always be the same, independent of the accidental relative or absolute path that it is stored at locally. Example: <syntaxhighlight lang="xml"><xsl:import href="http://acme.com/xsltlib/acme2html/1.0/acme2html.xsl"/></syntaxhighlight> The XSLT stylesheet might not even be available from this location. The URI serves as a canonical location identifier for this XSLT stylesheet. A local copy of the <code>acme2html/1.0/</code> directory is expected to reside somewhere, and the location of this directory relative to the local XML catalog file is specified in an entry in this catalog, like this: <syntaxhighlight lang="xml"><rewriteURI uriStartString="http://acme.com/xsltlib/acme2html/1.0/" rewritePrefix="../acmehtml10/"/></syntaxhighlight> This way, XSLT import URIs don’t have to be adjusted for the relative or absolute locations of the XSLT library’s local copy. The same URI rewriting works for resources retrieved by the <code>doc()</code> function from within an XSLT stylesheet. See [[XSLT Module]] for details on how to invoke XSLT stylesheets from within BaseX. NOTE: This URI rewriting is currently restricted to XSLT stylesheets. It has neither been enabled yet for the XQuery function <code>doc()</code> nor for XSD schema locations.
The XHTML DTD <code>xhtml1-strict.dtd</code> and all its linked resources will now be loaded from the specified path.
===GUI Mode===
[[File:catalog-file.jpg|thumb|Location for the Catalog File]]When running BaseX in GUI mode simply , enable DTD parsing and provide the path to your XML Catalog file in the '''Parsing'''-Tab of the Database Creation Dialog.
===Console & Server Mode===
To enable Entity Resolving in Console Mode specify the following [[options]]:
* <code>SET CATFILE [path]</code>
Now entity resolving is active for the current session. All subsequent <code>ADD</code> commands will use the catalog file to resolve entities.
The '''paths''' To enable Entity Resolving in Console Mode, enable the {{Option|DTD}} option and assign the path to your XML catalog file to the {{Option|CATFILE}} option. All subsequent commands for adding documents will use the specified catalog file to resolve entities. Paths to your catalog file and the actual dtds DTDs are either absolute or relative to the ''current working directory''. When using BaseX in Clientclient-Server-Mode this is relative to server mode, they are resolved against the working directory of the ''server's'. ===Additional Notes=== Entity resolving only works if the [[Parsers#XML Parsers|internal XML parser]] is switched off (which is the default case). The runtime properties of the catalog resolver can be changed by setting system properties, or adding a ''CatalogManager.properties' working directory' file to the classpath. By default, and if the system property {{Code|xml.catalog.ignoreMissing}} is not assigned, no warnings will be output to standard error if the properties file or resources linked from that file are not found. See [https://xerces.apache.org/xml-commons/components/resolver/resolver-article.html#ctrlresolver Controlling the Catalog Resolver] for more information. When using a catalog within an XQuery Module, the global <code>db:catfile</code> option may not be set in this module. You can set it via pragma instead:
<syntaxhighlight lang===Please Note==="xquery">Entity resolving only works with option(# db: <code>SET INTPARSE false<catfile xmlcatalog/code>catalog. <code>INTPARSE</code> is set to xml #) { xslt:transform(db:open('acme_content'false)[1], '../acmecustom/acmehtml.xsl' by ''default''.)}</syntaxhighlight>
Using It is assumed that this stylesheet <code>../acmecustom/acmehtml.xsl</code> (location relative to the internal parser let's you specify manually whether you want current XQuery script or module) imports <code>acme2html/1.0/acme2html.xsl</code> by its canonical URI that will be resolved to parse DTDs and entities or nota local URI by the catalog resolver.
== Using other Resolvers ==There might be some cases when Please note that since catalog-based URI rewriting does not work yet within URIs accessed from XQuery, you do not want cannot give a canonical location that needs to use be catalog-resolved as the built-in resolver that Java provides by default (via second argument of <code>com.sun.org.apache.xml.internal.resolver.*xslt:transform()</code>).
BaseX offers support for The catalog location in the Apache maintained [httppragma can be given relative to the current working directory (the directory that is returned by <code>file:current-dir()<//xmlcode>) or as an absolute operating system path.apache.org/commons XML Commons Resolver] available for download [http://xerces.apache.org/mirrors.cgi here]The catalog location in the pragma is not an XQuery expression; no concatenation or other operations may occur in the pragma, and the location string must not be surrounded by quotes.
To use it add '''resolver.jar''' to the classpath when [[Startup|starting BaseX]]:<pre class="brush:bash">java -cp basex.jar:resolver.jar org.basex.BaseXServer</pre>=Links==
== More Information ==* [https://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html XML Catalogs. OASIS Standard, Version 1.1. 07-October-2005]*[httphttps://en.wikipedia.org/wiki/Document_Type_Definition Wikipedia on Document Type Definitions]*[http://xml.apache.org/commons/components/resolver/resolver-article.html Apache XML Commons Article on Entity Resolving]*[http://java.sun.com/webservices/docs/1.6/jaxb/catalog.html XML Entity and URI Resolvers], Sun*[http://www.oasis-open.org/committees/download.php/14810/xml-catalogs.pdf XML Catalogs. OASIS Standard, Version 1.1. 07-October-2005.]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu