Changes

Full-Text (edit)

Revision as of 12:46, 20 December 2010

489 bytes added , 12:46, 20 December 2010

m

→‎Indexes: Added language option

The following options are available:

* '''Language''': language-specific parsers wil be used; this option affects tokenization and stemming (if enabled). BaseX comes with built-in support for English and German. Additionally, other languages can be supported using stemmers from [http://lucene.apache.org/java/docs/index.html Lucene] or [http://snowball.tartarus.org Snowball]. If the corresponding JARs or classes are in the Java class-path, BaseX will automatically make use of them.* '''Support Wildcards''': a trie-based index can be applied to support wildcard searches (<code>SET WILDCARDS ON</code>) * '''Stemming''': tokens are stemmed with the Porter Stemmer before being indexed (<code>SET STEMMING ON</code>) * '''Case Sensitive''': tokens are indexed in case-sensitive mode (<code>SET CASESEND ON</code>) * '''Diacritics''': diacritics are indexed as well (<code>SET DIACRITICS ON</code>) * '''TF/IDF Scoring''': TF/IDF-based scoring values are calculated and stored in the index (<code>SET SCORING 1/2</code>; details see below) * ~~Stopwords~~'''Stopword List''': a stop word list can be defined to reduce the number of indexed tokens (<code>SET STOPWORDS FILENAME</code>)

'''Caution:''' The index will only be applied if the activated optionsare also specified in the query:

'''Index Options:''' Case Sensitive, Stemming ON

Dimitar

administrator, Bureaucrats, editor, reviewer, Administrators

98

edits

Changes

Full-Text (edit)

Revision as of 12:46, 20 December 2010

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools