Changes

Jump to navigation Jump to search
39 bytes removed ,  22:28, 25 May 2016
* Paragraph delimiters are newlines (<code>&amp;#xa;</code>).
The basic <code>jar</code> JAR file of BaseX comes with built-in stemming support for English, German, Greek and Indonesian. Some more languages are supported if the following libraries are found in the classpath:
* [http://files.basex.org/maven/org/apache/lucene-stemmers/3.4.0/lucene-stemmers-3.4.0.jar lucene-stemmers-3.4.0.jar]: includes Snowball and Lucene stemmers and extends language support to the following languages: Bulgarian, Catalan, Czech, Danish, Dutch, Finnish, French, Hindi, Hungarian, Italian, Latvian, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.
* [http://en.sourceforge.jp/projects/igo/releases/ igo-0.4.3.jar]: [[Full-Text: Japanese|An additional article]] explains how Igo can be integrated, and how Japanese texts are tokenized and stemmed.
The JAR files are included in the <code>zip</code> ZIP and <code>exe</code> EXE distributions of BaseX.
The following two queries, which both return <code>true</code>, demonstrate that stemming depends on the selected language:
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu