Changes

Jump to navigation Jump to search
44 bytes removed ,  11:01, 15 September 2020
no edit summary
This article is part of the [[XQuery|XQuery Portal]]. It summarizes the features of the [https://www.w3.org/TR/xpath-full-text-10/ W3C XQuery Full Text 1.0] Recommendation, and custom features of the implementation in BaseX.
Please read the separate [[Indexes#Full-Text Index|Full-Text Index]] section in our documentation if you want to learn how to evaluate full-text requests on large databases within milliseconds.
* [https://files.basex.org/maven/org/apache/lucene-stemmers/3.4.0/lucene-stemmers-3.4.0.jar lucene-stemmers-3.4.0.jar] includes the Snowball and Lucene stemmers for the following languages: Arabic, Bulgarian, Catalan, Czech, Danish, Dutch, Finnish, French, Hindi, Hungarian, Italian, Latvian, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.
* [httphttps://enosdn.sourceforge.jpnet/projects/igo/releases/ igo-0.4.3.jar]: [[Full-Text: Japanese|An additional article]] explains how Igo can be integrated, and how Japanese texts are tokenized and stemmed.
The JAR files are included in the ZIP and EXE distributions of BaseX.
</syntaxhighlight>
The format of the thesaurus files must be the same as the format of the thesauri provided by the [httphttps://dev.w3.org/2007/xpath-full-text-10-test-suite XQuery and XPath Full Text 1.0 Test Suite]. It is an XML with structure defined by an [httphttps://dev.w3.org/cvsweb/~checkout~/2007/xpath-full-text-10-test-suite/TestSuiteStagingArea/TestSources/thesaurus.xsd?rev=1.3;content-type=application%2Fxml XSD Schema].
==Fuzzy Querying==
=Mixed Content=
When working with so-called narrative XML documents, such as HTML, [httphttps://tei-c.org/ TEI], or [httphttps://docbook.org / DocBook] documents, you typically have ''mixed content'', i.e., elements containing a mix of text and markup, such as:
<syntaxhighlight lang="xml">
|-
| {{Code|decomposition}}
| Defines how composed characters are handled. Three decompositions are supported: {{Code|none}}, {{Code|standard}}, and {{Code|full}}. More details are found in the [httphttps://docs.oracle.com/en/java/javase/711/docs/api/java.base/java/text/Collator.html JavaDoc] of the JDK.
|}
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu