Changes

Jump to navigation Jump to search
489 bytes added ,  12:46, 20 December 2010
m
→‎Indexes: Added language option
The following options are available:
* '''Language''': language-specific parsers wil be used; this option affects tokenization and stemming (if enabled). BaseX comes with built-in support for English and German. Additionally, other languages can be supported using stemmers from [http://lucene.apache.org/java/docs/index.html Lucene] or [http://snowball.tartarus.org Snowball]. If the corresponding JARs or classes are in the Java class-path, BaseX will automatically make use of them.* '''Support Wildcards''': a trie-based index can be applied to support wildcard searches (<code>SET WILDCARDS ON</code>) * '''Stemming''': tokens are stemmed with the Porter Stemmer before being indexed (<code>SET STEMMING ON</code>) * '''Case Sensitive''': tokens are indexed in case-sensitive mode (<code>SET CASESEND ON</code>) * '''Diacritics''': diacritics are indexed as well (<code>SET DIACRITICS ON</code>) * '''TF/IDF Scoring''': TF/IDF-based scoring values are calculated and stored in the index (<code>SET SCORING 1/2</code>; details see below) * Stopwords'''Stopword List''': a stop word list can be defined to reduce the number of indexed tokens (<code>SET STOPWORDS FILENAME</code>)
'''Caution:''' The index will only be applied if the activated optionsare also specified in the query:
'''Index Options:''' Case Sensitive, Stemming ON
administrator, Bureaucrats, editor, reviewer, Administrators
98

edits

Navigation menu