Difference between revisions of "Catalog Resolver"
(CATFILE points to a file) |
|||
Line 4: | Line 4: | ||
==Overview== | ==Overview== | ||
− | XML documents often rely on Document Type Definitions (DTDs). | + | |
− | While parsing a document with BaseX, entities can be resolved with respect to that particular DTD. | + | XML documents often rely on Document Type Definitions (DTDs). While parsing a document with BaseX, entities can be resolved with respect to that particular DTD. By default, the DTD is only used for entity resolution. |
− | By default, the DTD is only used for entity resolution. | ||
XHTML, for example, defines its doctype via the following line: | XHTML, for example, defines its doctype via the following line: | ||
+ | |||
<pre class="brush:xml"> | <pre class="brush:xml"> | ||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | ||
</pre> | </pre> | ||
− | Fetching <code>xhtml1-strict.dtd</code> obviously involves network traffic. When dealing with single files, this may seem tolerable, but | + | Fetching <code>xhtml1-strict.dtd</code> obviously involves network traffic. When dealing with single files, this may seem tolerable, but importing large collections benefits from caching these resources. Depending on the remote server, you will experience significant speed improvements when caching DTDs locally. |
− | importing large collections benefits from caching these resources. Depending on the remote server, you will experience significant speed improvements when caching DTDs locally. | ||
== XML Entity and URI Resolvers == | == XML Entity and URI Resolvers == | ||
− | BaseX | + | |
+ | BaseX relies on the Apache-maintained [http://xml.apache.org/commons XML Commons Resolver]. The {{Code|xml-resolver-1.2.jar}} library is included in the full distributions of BaseX. If the resolver is not found in the classpath, and if Java 8 is used, Java’s built-in resolver will be applied (via <code>com.sun.org.apache.xml.internal.resolver.*</code>). | ||
To enable entity resolving you have to provide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs. | To enable entity resolving you have to provide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs. | ||
A simple working example for XHTML might look like this: | A simple working example for XHTML might look like this: | ||
+ | |||
<pre class="brush:xml" start="0"> | <pre class="brush:xml" start="0"> | ||
− | |||
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> | <catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> | ||
− | + | <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///path/to/dtds/" /> | |
</catalog> | </catalog> | ||
</pre> | </pre> | ||
+ | |||
This rewrites all systemIds starting with: ''<nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki>'' to ''file:///path/to/dtds/''. | This rewrites all systemIds starting with: ''<nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki>'' to ''file:///path/to/dtds/''. | ||
Line 33: | Line 34: | ||
===GUI Mode=== | ===GUI Mode=== | ||
+ | |||
When running BaseX in GUI mode, simply provide the path to your XML Catalog file in the ''Parsing'' Tab of the Database Creation Dialog. | When running BaseX in GUI mode, simply provide the path to your XML Catalog file in the ''Parsing'' Tab of the Database Creation Dialog. | ||
Line 42: | Line 44: | ||
===Please Note=== | ===Please Note=== | ||
+ | |||
Entity resolving only works if the [[Parsers#XML Parsers|internal XML parser]] is switched off (which is the default case). | Entity resolving only works if the [[Parsers#XML Parsers|internal XML parser]] is switched off (which is the default case). | ||
If you use the internal parser, you can manually specify whether you want to parse DTDs and entities or not. | If you use the internal parser, you can manually specify whether you want to parse DTDs and entities or not. | ||
− | == | + | == More Information == |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
*[http://en.wikipedia.org/wiki/Document_Type_Definition Wikipedia on Document Type Definitions] | *[http://en.wikipedia.org/wiki/Document_Type_Definition Wikipedia on Document Type Definitions] | ||
*[http://xml.apache.org/commons/components/resolver/resolver-article.html Apache XML Commons Article on Entity Resolving] | *[http://xml.apache.org/commons/components/resolver/resolver-article.html Apache XML Commons Article on Entity Resolving] | ||
*[http://java.sun.com/webservices/docs/1.6/jaxb/catalog.html XML Entity and URI Resolvers], Sun | *[http://java.sun.com/webservices/docs/1.6/jaxb/catalog.html XML Entity and URI Resolvers], Sun | ||
*[http://www.oasis-open.org/committees/download.php/14810/xml-catalogs.pdf XML Catalogs. OASIS Standard, Version 1.1. 07-October-2005.] | *[http://www.oasis-open.org/committees/download.php/14810/xml-catalogs.pdf XML Catalogs. OASIS Standard, Version 1.1. 07-October-2005.] |
Revision as of 12:00, 5 March 2019
This article is part of the Advanced User's Guide. It clarifies how to deal with external DTD declarations when parsing XML data.
Contents
Overview
XML documents often rely on Document Type Definitions (DTDs). While parsing a document with BaseX, entities can be resolved with respect to that particular DTD. By default, the DTD is only used for entity resolution.
XHTML, for example, defines its doctype via the following line:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Fetching xhtml1-strict.dtd
obviously involves network traffic. When dealing with single files, this may seem tolerable, but importing large collections benefits from caching these resources. Depending on the remote server, you will experience significant speed improvements when caching DTDs locally.
XML Entity and URI Resolvers
BaseX relies on the Apache-maintained XML Commons Resolver. The xml-resolver-1.2.jar
library is included in the full distributions of BaseX. If the resolver is not found in the classpath, and if Java 8 is used, Java’s built-in resolver will be applied (via com.sun.org.apache.xml.internal.resolver.*
).
To enable entity resolving you have to provide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs.
A simple working example for XHTML might look like this:
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///path/to/dtds/" /> </catalog>
This rewrites all systemIds starting with: http://www.w3.org/TR/xhtml1/DTD/ to file:///path/to/dtds/.
The XHTML DTD xhtml1-strict.dtd
and all its linked resources will now be loaded from the specified path.
GUI Mode
When running BaseX in GUI mode, simply provide the path to your XML Catalog file in the Parsing Tab of the Database Creation Dialog.
Console & Server Mode
To enable Entity Resolving in Console Mode, assign a catalog file path to the CATFILE
option. All subsequent ADD
commands will use the specified catalog file to resolve entities.
The paths to your catalog file and the actual DTDs are either absolute or relative to the current working directory. When using BaseX in Client-Server-Mode, this is relative to the server's working directory.
Please Note
Entity resolving only works if the internal XML parser is switched off (which is the default case). If you use the internal parser, you can manually specify whether you want to parse DTDs and entities or not.