Changes

Jump to navigation Jump to search
1,369 bytes added ,  12:53, 13 March 2019
no edit summary
This article is part of the [[Advanced User's Guide]]. It clarifies how to deal with external DTD declarations when parsing and transforming XML data. ==OverviewIntroduction== XML documents often rely on Document Type Definitions (DTDDTDs). While parsing a document with BaseX elements and entities Entities can be checked for validity resolved with respect to that particular DTD.By default, the DTD is only used for entity resolution. XHTML, for example, defines its doctype via the following line:
XHTML for example defines its doctype via the following line:
<pre class="brush:xml">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
</pre>
Fetching the <code>xhtml1-strict.dtd</code> obviously involves some network traffic. When dealing with single files , this may seem tolerable, but importing large collections might benefit benefits from caching these resources locally. Depending on your connection the remote server, you will experience significant speed improvementswhen caching DTDs locally.
== To address these issues, the [https://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html XML Entity Catalogs Standard] defines an entity catalog that maps both external identifiers and arbitrary URI references to URI Resolvers in BaseX references. ==Usage== BaseX relies on the Apache-maintained [http://xml.apache.org/commons XML Commons Resolver]. The ''xml-resolver-1.2.jar'' library is able included in the full distributions of BaseX. If the resolver is not found in the classpath, and if Java 8 is used, Java’s built-in resolver will be applied (via <code>com.sun.org.apache.xml.internal.resolver.*</code>). To enable entity resolving you have to use available URI resolvers without any additional configurationprovide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs.  A simple working example for XHTML might look like this:
To enable entity resolving you have to provide a valid XML Catalog file.
A simple working example for XHTML might look like this:
<pre class="brush:xml" start="0">
<?xml version="1.0"?>
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///path/to/dtds/" />
</catalog>
</pre>
This rewrites all SystemIds starting with: ''<nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki>'' to ''file:///path/to/dtds/''.
This rewrites all systemIds starting with: <code><nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki></code> to <code>file:///path/to/dtds/</code>. For example, if the following XML file is parsed: <pre class="brush:xml" start="0"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"/></pre> The XHTML DTD <code>xhtml1-stricttransitional.dtd</code> and all its linked resources will now be loaded from the specified path. The catalog file ''etc/w3-catalog.xml'' in the full distributions can be used out of the box. It defines rewriting for some common W3 DTD files. 
===GUI Mode===
[[File:catalog-file.jpg|thumb|Location for the Catalog File]]When running BaseX in GUI mode simply , enable DTD parsing and provide the path to your XML Catalog file in the '''Parsing'''-Tab of the Database Creation Dialog.
===Console & Server Mode===
To enable Entity Resolving in Console Mode specify the following [[options]]:
* <code>SET CATFILE [path]</code>
Now entity resolving is active for the current session. All subsequent <code>ADD</code> commands will use the catalog file to resolve entities.
'''Please note''' that entity resolving only works with To enable Entity Resolving in Console Mode, enable the {{Option|DTD}} option and assign the path to your XML catalog file to the {{Option|CATFILE}} option: <code>SET INTPARSE false</code>. <code>INTPARSE</code> is set All subsequent commands for adding documents will use the specified catalog file to false by defaultresolve entities.
== Using other Resolvers ==There might be some cases when you do not want Paths to use your catalog file and the builtactual DTDs are either absolute or relative to the ''current working directory''. When using BaseX in client-in resolver that Java provides by default (via <code>com.sun.org.apache.xml.internal.resolver.*</code>)server mode, they are resolved against the working directory of the ''server''.
BaseX offers support for ===Additional Notes=== Entity resolving only works if the Apache maintained [http://xml.apache.org/commons [Parsers#XML Parsers|internal XML Commons Resolverparser] available for download [http://xerces.apache.org/mirrors.cgi here]is switched off (which is the default case).To use it add The runtime properties of the catalog resolver can be changed by setting system properties, or adding a '''resolverCatalogManager.jar'properties'' file to the classpath when . By default, and if the system property {{Code|xml.catalog.ignoreMissing}} is not assigned, no warnings will be output to standard error if the properties file or resources linked from that file are not found. See [[Startup|starting BaseX]]https:<pre class="brush:bash">java -cp basex//xerces.jar:resolverapache.jar org/xml-commons/components/resolver/resolver-article.basexhtml#ctrlresolver Controlling the Catalog Resolver] for more information.BaseXServer</pre>==Links==
== More Information ==* [https://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html XML Catalogs. OASIS Standard, Version 1.1. 07-October-2005]* [http://en.wikipedia.org/wiki/Document_Type_Definition Wikipedia on Document Type Definitions]*[http://xml.apache.org/commons/components/resolver/resolver-article.html Apache XML Commons Article on Entity Resolving]*[http://java.sun.com/webservices/docs/1.6/jaxb/catalog.html XML Entity and URI Resolvers], Sun*[http://www.oasis-open.org/committees/download.php/14810/xml-catalogs.pdf XML Catalogs. OASIS Standard, Version 1.1. 07-October-2005.]
Bureaucrats, editor, reviewer, Administrators
13,551

edits

Navigation menu