Difference between revisions of "Catalog Resolver"

From BaseX Documentation
Jump to navigation Jump to search
Line 1: Line 1:
 
==Overview==
 
==Overview==
XML documents often rely on Document Type Definitions (DTD).  
+
XML documents often rely on Document Type Definitions (DTDs).  
While parsing a document with BaseX elements and entities can be checked for validity with respect to that particular DTD.
+
While parsing a document with BaseX, elements and entities can be checked for validity with respect to that particular DTD.
Currently the DTD is used only for entity resolution.
+
By default, the DTD is only used only for entity resolution.
  
 
+
XHTML, for example, defines its doctype via the following line:
XHTML for example defines its doctype via the following line:
 
 
<pre class="brush:xml">
 
<pre class="brush:xml">
 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
 
</pre>
 
</pre>
  
Fetching <code>xhtml1-strict.dtd</code> obviously involves network traffic. When dealing with single files this may seem tolerable, but  
+
Fetching <code>xhtml1-strict.dtd</code> obviously involves network traffic. When dealing with single files, this may seem tolerable, but  
importing large collections benefit from caching these resources. Depending on the remote server you will experience significant speed improvements when caching DTDs locally.
+
importing large collections benefit from caching these resources. Depending on the remote server, you will experience significant speed improvements when caching DTDs locally.
  
 
== XML Entity and URI Resolvers in BaseX ==
 
== XML Entity and URI Resolvers in BaseX ==
 
BaseX comes with a default URI resolver that is usable out of the box.
 
BaseX comes with a default URI resolver that is usable out of the box.
  
To enable entity resolving you have to provide a valid XML Catalog file, so the parser knows where to look for mirrored DTDs.
+
To enable entity resolving you have to provide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs.
  
A simple working example for XHTML might look like this:
+
A simple working example for XHTML might look like this:
 
<pre class="brush:xml" start="0">
 
<pre class="brush:xml" start="0">
 
<?xml version="1.0"?>
 
<?xml version="1.0"?>
Line 25: Line 24:
 
</catalog>
 
</catalog>
 
</pre>
 
</pre>
This rewrites all SystemIds starting with: ''<nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki>'' to ''file:///path/to/dtds/''.
+
This rewrites all systemIds starting with: ''<nowiki>http://www.w3.org/TR/xhtml1/DTD/</nowiki>'' to ''file:///path/to/dtds/''.
  
 
The XHTML DTD <code>xhtml1-strict.dtd</code> and all its linked resources will now be loaded from the specified path.
 
The XHTML DTD <code>xhtml1-strict.dtd</code> and all its linked resources will now be loaded from the specified path.
 +
 
===GUI Mode===
 
===GUI Mode===
When running BaseX in GUI mode simply provide the path to your XML Catalog file in the '''Parsing'''-Tab of the Database Creation Dialog.
+
When running BaseX in GUI mode, simply provide the path to your XML Catalog file in the ''Parsing'' Tab of the Database Creation Dialog.
  
 
===Console & Server Mode===
 
===Console & Server Mode===
To enable Entity Resolving in Console Mode specify the following [[options]]:
+
To enable Entity Resolving in Console Mode, specify the following [[options]]:
 
* <code>SET CATFILE [path]</code>
 
* <code>SET CATFILE [path]</code>
 
Now entity resolving is active for the current session. All subsequent <code>ADD</code> commands will use the catalog file to resolve entities.
 
Now entity resolving is active for the current session. All subsequent <code>ADD</code> commands will use the catalog file to resolve entities.
  
The '''paths''' to your catalog file and the actual dtds are either absolute or relative to the ''current working directory''. When using BaseX in Client-Server-Mode this is relative to the ''server's'' working directory.  
+
The '''paths''' to your catalog file and the actual DTDs are either absolute or relative to the ''current working directory''. When using BaseX in Client-Server-Mode, this is relative to the ''server's'' working directory.  
  
 
===Please Note===
 
===Please Note===

Revision as of 12:14, 24 January 2011

Overview

XML documents often rely on Document Type Definitions (DTDs). While parsing a document with BaseX, elements and entities can be checked for validity with respect to that particular DTD. By default, the DTD is only used only for entity resolution.

XHTML, for example, defines its doctype via the following line:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 

Fetching xhtml1-strict.dtd obviously involves network traffic. When dealing with single files, this may seem tolerable, but importing large collections benefit from caching these resources. Depending on the remote server, you will experience significant speed improvements when caching DTDs locally.

XML Entity and URI Resolvers in BaseX

BaseX comes with a default URI resolver that is usable out of the box.

To enable entity resolving you have to provide a valid XML Catalog file, so that the parser knows where to look for mirrored DTDs.

A simple working example for XHTML might look like this:

<?xml version="1.0"?>
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
   <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///path/to/dtds/" />
</catalog>

This rewrites all systemIds starting with: http://www.w3.org/TR/xhtml1/DTD/ to file:///path/to/dtds/.

The XHTML DTD xhtml1-strict.dtd and all its linked resources will now be loaded from the specified path.

GUI Mode

When running BaseX in GUI mode, simply provide the path to your XML Catalog file in the Parsing Tab of the Database Creation Dialog.

Console & Server Mode

To enable Entity Resolving in Console Mode, specify the following options:

  • SET CATFILE [path]

Now entity resolving is active for the current session. All subsequent ADD commands will use the catalog file to resolve entities.

The paths to your catalog file and the actual DTDs are either absolute or relative to the current working directory. When using BaseX in Client-Server-Mode, this is relative to the server's working directory.

Please Note

Entity resolving only works with option: SET INTPARSE false. INTPARSE is set to false by default.

Using the internal parser let's you specify manually whether you want to parse DTDs and entities or not.

Using other Resolvers

There might be some cases when you do not want to use the built-in resolver that Java provides by default (via com.sun.org.apache.xml.internal.resolver.*).

BaseX offers support for the Apache maintained XML Commons Resolver available for download here.

To use it add resolver.jar to the classpath when starting BaseX:

java -cp basex.jar:resolver.jar org.basex.BaseXServer

More Information