Revision as of 14:38, 16 April 2019

This XQuery Module extends the W3C Full Text Recommendation with some useful functions: The index can be directly accessed, fulltext results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the contains text expression, can be explicitly requested from items.

Conventions

All functions and errors in this module are assigned to the http://basex.org/modules/ft namespace, which is statically bound to the ft prefix.

Functions

ft:search

Signatures	`ft:search($db as xs:string, $terms as item()) as text()` `ft:search($db as xs:string, $terms as item(), $options as map()?) as text()*`
Summary	Returns all text nodes from the full-text index of the database `$db` that contain the specified `$terms`. The options used for tokenizing the input and building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well. The `$options` argument can be used to control full-text processing. The following options are supported (the introduction on Full-Text processing gives you equivalent expressions in the XQuery Full-Text notation): `mode`: determines the mode how tokens are searched. Allowed values are `any`, `any word`, `all`, `all words`, and `phrase`. `any` is the default search mode. `fuzzy`: turns fuzzy querying on or off. Allowed values are `true` and `false`. By default, fuzzy querying is turned off. `wildcards`: turns wildcard querying on or off. Allowed values are `true` and `false`. By default, wildcard querying is turned off. `ordered`: requires that all tokens occur in the order in which they are specified. Allowed values are `true` and `false`. The default is `false`. `content`: specifies that the matched tokens need to occur at the beginning or end of a searched string, or need to cover the entire string. Allowed values are `start`, `end`, and `entire`. By default, the option is turned off. `scope`: defines the scope in which tokens must be located. The option has following sub options: `same`: can be set to `true` or `false`. It specifies if tokens need to occur in the same or different units. `unit`: can be `sentence` or `paragraph`. It specifies the unit for finding tokens. `window`: sets up a window in which all tokens must be located. By default, the option is turned off. It has following sub options: `size`: specifies the size of the window in terms of units. `unit`: can be `sentences`, `sentences` or `paragraphs`. The default is `words`. `distance`: specifies the distance in which tokens must occur. By default, the option is turned off. It has following sub options: `min`: specifies the minimum distance in terms of units. The default is `0`. `max`: specifies the maximum distance in terms of units. The default is `∞`. `unit`: can be `words`, `sentences` or `paragraphs`. The default is `words`.
Errors	`db:open`: The addressed database does not exist or could not be opened. `db:no-index`: the index is not available. `options`: the fuzzy and wildcard option cannot be both specified.
Examples	`ft:search("DB", "QUERY")`: Return all text nodes of the database `DB` that contain the term `QUERY`. Return all text nodes of the database `DB` that contain the numbers `2010` and `2020`: `ft:search("DB", ("2010", "2020"), map { 'mode': 'all' })` Return text nodes that contain the terms `A` and `B` in a distance of at most 5 words: ft:search("db", ("A", "B"), map { "mode": "all words", "distance": map { "max": "5", "unit": "words" } }) Iterate over three databases and return all elements containing terms similar to `Hello World` in the text nodes: let $terms := "Hello Worlds" let $fuzzy := true() for $db in 1 to 3 let $dbname := 'DB' \|\| $db return ft:search($dbname, $terms, map { 'fuzzy': $fuzzy })/..

ft:contains

Signatures	`ft:contains($input as item(), $terms as item()) as xs:boolean` `ft:contains($input as item(), $terms as item(), $options as map(*)?) as xs:boolean`
Summary	Checks if the specified `$input` items contain the specified `$terms`. The function does the same as the Full-Text expression `contains text`, but options can be specified more dynamically. The `$options` are the same as for ft:search, and the following ones in addition: `case`: determines how character case is processed. Allowed values are `insensitive`, `sensitive`, `upper` and `lower`. By default, search is case insensitive. `diacritics`: determines how diacritical characters are processed. Allowed values are `insensitive` and `sensitive`. By default, search is diacritical insensitive. `stemming`: determines is tokens are stemmed. Allowed values are `true` and `false`. By default, stemming is turned off. `language`: determines the language. This option is relevant for stemming tokens. All language codes are supported. The default language is `en`.
Errors	`options`: specified options are conflicting.
Examples	Checks if `jack` or `john` occurs in the input string `John Doe`: ft:contains("John Doe", ("jack", "john"), map { "mode": "any" }) Calls the function with stemming turned on and off: (true(), false()) ! ft:contains("Häuser", "Haus", map { 'stemming': ., 'language':'de' })

ft:mark

Signatures ft:mark($nodes as node()*) as node()*
ft:mark($nodes as node()*, $name as xs:string) as node()*

Summary

Puts a marker element around the resulting $nodes of a full-text index request.
The default name of the marker element is mark. An alternative name can be chosen via the optional $name argument.
Please note that:

the full-text expression that computes the token positions must be specified as argument of the ft:mark() function, as all position information is lost in subsequent processing steps. You may need to specify more than one full-text expression if you want to use the function in a FLWOR expression, as shown in Example 2.
the XML node to be transformed must be an internal "database" node. The transform expression can be used to apply the method to a main-memory fragment, as shown in Example 3.

Examples

Example 1: The following query returns <XML>hello world</XML>, if one text node of the database DB has the value "hello world":

ft:mark(db:open('DB')//*[text() contains text 'hello'])

Example 2: The following expression loops through the first ten full-text results and marks the results in a second expression:

let $start := 1
let $end   := 10
let $term  := 'welcome'
for $ft in (db:open('DB')//*[text() contains text { $term }])[position() = $start to $end]
return element hit {
  ft:mark($ft[text() contains text { $term }])
}

Example 3: The following expression returns word:

copy $p := <p>word</p>
modify ()
return ft:mark($p[text() contains text 'word'], 'b')

ft:extract

Signatures ft:extract($nodes as node()*) as node()*
ft:extract($nodes as node()*, $name as xs:string) as node()*
ft:extract($nodes as node()*, $name as xs:string, $length as xs:integer) as node()*

Summary Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting $nodes of a full-text index request and chops irrelevant sections of the result.
The default element name of the marker element is mark. An alternative element name can be chosen via the optional $name argument.
The default length of the returned text is 150 characters. An alternative length can be specified via the optional $length argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.
For more details on this function, please have a look at ft:mark.

Examples

The following query may return <XML>...hello...<XML> if a text node of the database DB contains the string "hello world":

ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1)

ft:count

Signatures	`ft:count($nodes as node()*) as xs:integer`
Summary	Returns the number of occurrences of the search terms specified in a full-text expression.
Examples	`ft:count(//*[text() contains text 'QUERY'])` returns the `xs:integer` value `2` if a document contains two occurrences of the string "QUERY".

ft:score

Signatures	`ft:score($item as item()) as xs:double`
Summary	Returns the score values (0.0 - 1.0) that have been attached to the specified items. `0` is returned a value if no score was attached.
Examples	`ft:score('a' contains text 'a')` returns the `xs:double` value `1`.

ft:tokens

Signatures	`ft:tokens($db as xs:string) as element(value)` `ft:tokens($db as xs:string, $prefix as xs:string) as element(value)`
Summary	Returns all full-text tokens stored in the index of the database `$db`, along with their numbers of occurrences. If `$prefix` is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index.
Errors	`db:open`: The addressed database does not exist or could not be opened. `db:no-index`: the full-text index is not available.
Examples	Returns the number of occurrences for a single, specific index entry: let $term := ft:tokenize($term) return number(ft:tokens('db', $term)[. = $term]/@count)

ft:tokenize

Signatures ft:tokenize($string as xs:string?) as xs:string*
ft:tokenize($string as xs:string?, $options as map(*)?) as xs:string*

Summary

Tokenizes the given $string, using the current default full-text options or the $options specified as second argument, and returns a sequence with the tokenized string. The following options are available:

case: determines how character case is processed. Allowed values are insensitive, sensitive, upper and lower. By default, search is case insensitive.
diacritics: determines how diacritical characters are processed. Allowed values are insensitive and sensitive. By default, search is diacritical insensitive.
stemming: determines is tokens are stemmed. Allowed values are true and false. By default, stemming is turned off.
language: determines the language. This option is relevant for stemming tokens. All language codes are supported. The default language is en.

The $options argument can be used to control full-text processing.

Examples

ft:tokenize("No Doubt") returns the two strings no and doubt.
ft:tokenize("École", map { 'diacritics': 'sensitive' }) returns the string école.
declare ft-option using stemming; ft:tokenize("GIFTS") returns a single string gift.

ft:normalize

Signatures	`ft:normalize($string as xs:string?) as xs:string` `ft:normalize($string as xs:string?, $options as map(*)?) as xs:string`
Summary	Normalizes the given `$string`, using the current default full-text options or the `$options` specified as second argument. The function expects the same arguments as ft:tokenize.
Examples	`ft:tokenize("Häuser am Meer", map { 'case': 'sensitive' })` returns the string `Hauser am Meer`.

Errors

Code	Description
`options`	Both wildcards and fuzzy search have been specified as search options.

Changelog

Version 9.1

Updated: ft:tokenize and ft:normalize can be called with empty sequence.

Version 9.0

Updated: error codes updated; errors now use the module namespace

Version 8.0

Added: ft:contains, ft:normalize
Updated: Options added to ft:tokenize

Version 7.8

Added: ft:contains
Updated: Options added to ft:search

Version 7.7

Updated: the functions no longer accept Database Nodes as reference. Instead, the name of a database must now be specified.

Version 7.2

Updated: ft:search (second argument generalized, third parameter added)

Version 7.1

Added: ft:tokens, ft:tokenize

Difference between revisions of "Full-Text Module"

Revision as of 14:38, 16 April 2019

Contents

Conventions

Functions

ft:search

ft:contains

ft:mark

ft:extract

ft:count

ft:score

ft:tokens

ft:tokenize

ft:normalize

Errors

Changelog

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-This [[Module Library|XQuery Module]] extends the [http://www.w3.org/TR/xpath-full-text-10 W3C Full Text Recommendation] with some useful functions: The index can be directly accessed, full-text results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the {{Code|contains text}} expression, can be explicitly requested from items.
+This [[Module Library|XQuery Module]] extends the [http://www.w3.org/TR/xpath-full-text-10 W3C Full Text Recommendation] with some useful functions: The index can be directly accessed, fulltext results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the {{Code|contains text}} expression, can be explicitly requested from items.
 =Conventions=
-All functions in this module are assigned to the {{Code|http://basex.org/modules/ft}} namespace, which is statically bound to the {{Code|ft}} prefix.<br/>
+All functions and errors in this module are assigned to the <code><nowiki>http://basex.org/modules/ft</nowiki></code> namespace, which is statically bound to the {{Code|ft}} prefix.<br/>
-All errors are assigned to the {{Code|http://basex.org/errors}} namespace, which is statically bound to the {{Code|bxerr}} prefix.
 =Functions=
@@ Line 13: / Line 12: @@
 |-
 | width='120' | '''Signatures'''
-|{{Func|ft:search|$db as xs:string, $terms as item()*|text()*}}<br/>{{Func|ft:search|$db as xs:string, $terms as item()*, $options as item()|text()*}}
+|{{Func|ft:search|$db as xs:string, $terms as item()*|text()*}}<br/>{{Func|ft:search|$db as xs:string, $terms as item()*, $options as map(*)?|text()*}}
 |-
 | '''Summary'''
 |Returns all text nodes from the full-text index of the database {{Code|$db}} that contain the specified {{Code|$terms}}.<br/>The options used for tokenizing the input and building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well.
-The {{Code|$options}} argument can be used to control full-text processing. Options can be either specified<br/>
+The {{Code|$options}} argument can be used to control full-text processing. The following options are supported (the introduction on [[Full-Text]] processing gives you equivalent expressions in the XQuery Full-Text notation):
-* as children of an {{Code|&lt;options/&gt;}} element, e.g.:
-<pre class="brush:xml">
-<options>
-  <key1 value='value1'/>
-  ...
-</options>
-</pre>
-* as map, which contains all key/value pairs:
-<pre class="brush:xml">
-{ "key1": "value1", ... }
-</pre>
-The following options are supported (the introduction on [[Full-Text]] processing gives you equivalent expressions in the XQuery Full-Text notation):
 * {{Code|mode}}: determines the mode how tokens are searched. Allowed values are {{Code|any}}, {{Code|any word}}, {{Code|all}}, {{Code|all words}}, and {{Code|phrase}}. {{Code|any}} is the default search mode.
-* {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are an empty string or {{Code|true}}, or {{Code|false}}. By default, fuzzy querying is turned off.
+* {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are {{Code|true}} and {{Code|false}}. By default, fuzzy querying is turned off.
-* {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are an empty string or {{Code|true}}, or {{Code|false}}. By default, wildcard querying is turned off.
+* {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are {{Code|true}} and {{Code|false}}. By default, wildcard querying is turned off.
-The following options have been added in {{Version|7.8}}:
 * {{Code|ordered}}: requires that all tokens occur in the order in which they are specified. Allowed values are {{Code|true}} and {{Code|false}}. The default is {{Code|false}}.
 * {{Code|content}}: specifies that the matched tokens need to occur at the beginning or end of a searched string, or need to cover the entire string. Allowed values are {{Code|start}}, {{Code|end}}, and {{Code|entire}}. By default, the option is turned off.
@@ Line 41: / Line 27: @@
 * {{Code|window}}: sets up a window in which all tokens must be located. By default, the option is turned off. It has following sub options:
 ** {{Code|size}}: specifies the size of the window in terms of ''units''.
-** {{Code|unit}}: can be {{Code|sentences}}, {{Code|sentences}}, or {{Code|paragraphs}}. The default is {{Code|words}}.
+** {{Code|unit}}: can be {{Code|sentences}}, {{Code|sentences}} or {{Code|paragraphs}}. The default is {{Code|words}}.
 * {{Code|distance}}: specifies the distance in which tokens must occur. By default, the option is turned off. It has following sub options:
 ** {{Code|min}}: specifies the minimum distance in terms of ''units''. The default is {{Code|0}}.
 ** {{Code|max}}: specifies the maximum distance in terms of ''units''. The default is {{Code|∞}}.
-** {{Code|unit}}: can be {{Code|words}}, {{Code|sentences}}, or {{Code|paragraphs}}. The default is {{Code|words}}.
+** {{Code|unit}}: can be {{Code|words}}, {{Code|sentences}} or {{Code|paragraphs}}. The default is {{Code|words}}.
 |-
 | '''Errors'''
-|{{Error|BXDB0002|XQuery Errors#BaseX Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|BXDB0004|Database Module#Errors}} the full-text index is not available.<br/>{{Error|BXFT0001|#Errors}} the fuzzy and wildcard option cannot be both specified.
+|{{Error|db:open|Database Module#Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|db:no-index|Database Module#Errors}} the index is not available.<br/>{{Error|options|#Errors}} the fuzzy and wildcard option cannot be both specified.
 |-
 | '''Examples'''
 |
 * {{Code|ft:search("DB", "QUERY")}}: Return all text nodes of the database {{Code|DB}} that contain the term {{Code|QUERY}}.
-* Return all text nodes of the database {{Code|DB}} that contain the numbers {{Code|2010}} and {{Code|2011}}:<br/><code>ft:search("DB", ("2010","2011"), { 'mode': 'all' })</code>
+* Return all text nodes of the database {{Code|DB}} that contain the numbers {{Code|2010}} and {{Code|2020}}:<br/><code>ft:search("DB", ("2010", "2020"), map { 'mode': 'all' })</code>
 * Return text nodes that contain the terms {{Code|A}} and {{Code|B|}} in a distance of at most 5 words:
 <pre class="brush:xquery">
-ft:search("db", ("A", "B"), {
+ft:search("db", ("A", "B"), map {
    "mode": "all words",
-   "distance": {
+   "distance": map {
      "max": "5",
      "unit": "words"
@@ Line 68: / Line 54: @@
 let $terms := "Hello Worlds"
 let $fuzzy := true()
-let $options := <options><fuzzy value="{ $fuzzy }"/></options>
 for $db in 1 to 3
 let $dbname := 'DB' || $db
-return ft:search($dbname, $terms, $options)/..
+return ft:search($dbname, $terms, map { 'fuzzy': $fuzzy })/..
 </pre>
 |}
 ==ft:contains==
-{{Mark|Introduced with Version 7.8:}}
 {| width='100%'
 |-
 | width='120' | '''Signatures'''
-|{{Func|ft:contains|$input as item()*, $terms as item()*|xs:boolean}}<br/>{{Func|ft:search|$input as item()*, $terms as item()*, $options as item()|xs:boolean}}
+|{{Func|ft:contains|$input as item()*, $terms as item()*|xs:boolean}}<br/>{{Func|ft:contains|$input as item()*, $terms as item()*, $options as map(*)?|xs:boolean}}
 |-
 | '''Summary'''
-|Checks if the specified {{Code|$input}} items contain the specified {{Code|$terms}}.<br/>The function is similar to the [[Full-Text]] expression {{Code|contains text}}, but the processing {{Code|$options}} can be dynamically specified:<br/>
+|Checks if the specified {{Code|$input}} items contain the specified {{Code|$terms}}.<br/>The function does the same as the [[Full-Text]] expression {{Code|contains text}}, but options can be specified more dynamically. The {{Code|$options}} are the same as for [[#ft:search|ft:search]], and the following ones in addition:
-* as children of an {{Code|&lt;options/&gt;}} element, e.g.:
+* {{Code|case}}: determines how character case is processed. Allowed values are {{Code|insensitive}}, {{Code|sensitive}}, {{Code|upper}} and {{Code|lower}}. By default, search is case insensitive.
-<pre class="brush:xml">
+* {{Code|diacritics}}: determines how diacritical characters are processed. Allowed values are {{Code|insensitive}} and {{Code|sensitive}}. By default, search is diacritical insensitive.
-<options>
-  <key1 value='value1'/>
-  ...
-</options>
-</pre>
-* as map, which contains all key/value pairs:
-<pre class="brush:xml">
-{ "key1": "value1", ... }
-</pre>
-The function supports the same options as the [[ft:search|ft:search]] function and some more:
-* {{Code|case}}: determines how character case is processed. Allowed values {{Code|insensitive}}, {{Code|sensitive}}, {{Code|upper}} and {{Code|lower}}. By default, search is case insensitive.
-* {{Code|diacritics}}: determines how diacritical characters are processed. Allowed values {{Code|insensitive}} and {{Code|sensitive}}. By default, search is diacritical insensitive.
 * {{Code|stemming}}: determines is tokens are stemmed. Allowed values are {{Code|true}} and {{Code|false}}. By default, stemming is turned off.
-* {{Code|language}}: determines the language. This option is relevant for stemming tokens. All language codes are supported. By default, the language option is not set.
+* {{Code|language}}: determines the language. This option is relevant for stemming tokens. All language codes are supported. The default language is {{Code|en}}.
 |-
 | '''Errors'''
-|{{Error|BXFT0001|#Errors}} the fuzzy and wildcard option cannot be both specified.
+|{{Error|options|#Errors}} specified options are conflicting.
 |-
 | '''Examples'''
@@ Line 110: / Line 81: @@
 * Checks if {{Code|jack}} or {{Code|john}} occurs in the input string {{Code|John Doe}}:
 <pre class="brush:xquery">
-ft:contains("John Doe", ("jack", "john"), { "mode": "any" })
+ft:contains("John Doe", ("jack", "john"), map { "mode": "any" })
+</pre>
+* Calls the function with stemming turned on and off:
+<pre class="brush:xquery">
+(true(), false()) ! ft:contains("Häuser", "Haus", map { 'stemming': ., 'language':'de' })
 </pre>
 |}
 ==ft:mark==
 {| width='100%'
 |-
 | width='120' | '''Signatures'''
-|{{Func|ft:mark|$nodes as node()*|node()*}}<br />{{Func|ft:mark|$nodes as node()*, $tag as xs:string|node()*}}
+|{{Func|ft:mark|$nodes as node()*|node()*}}<br />{{Func|ft:mark|$nodes as node()*, $name as xs:string|node()*}}
 |-
 | '''Summary'''
-|Puts a marker element around the resulting {{Code|$nodes}} of a full-text index request.<br />The default tag name of the marker element is {{Code|mark}}. An alternative tag name can be chosen via the optional {{Code|$tag}} argument.<br />Please note that:
+|Puts a marker element around the resulting {{Code|$nodes}} of a full-text index request.<br />The default name of the marker element is {{Code|mark}}. An alternative name can be chosen via the optional {{Code|$name}} argument.<br />Please note that:
-* the XML node to be transformed must be an internal "database" node. The {{Code|transform}} expression can be used to apply the method to a main-memory fragment, as shown in Example 2.
+* the full-text expression that computes the token positions must be specified as argument of the <code>ft:mark()</code> function, as all position information is lost in subsequent processing steps. You may need to specify more than one full-text expression if you want to use the function in a FLWOR expression, as shown in Example 2.
-* the full-text expression, which computes the token positions, must be specified within <code>ft:mark()</code> function, as all position information is lost in subsequent processing steps. You may need to specify more than one full-text expression if you want to use the function in a FLWOR expression, as shown in Example 3.
+* the XML node to be transformed must be an internal "database" node. The {{Code|transform}} expression can be used to apply the method to a main-memory fragment, as shown in Example 3.
 |-
 | '''Examples'''
@@ Line 130: / Line 106: @@
 ft:mark(db:open('DB')//*[text() contains text 'hello'])
 </pre>
-'''Example 2''': The following expression returns {{Code|&lt;p&gt;&lt;b&gt;word&lt;/b&gt;&lt;/p&gt;}}:
+'''Example 2''': The following expression loops through the first ten full-text results and marks the results in a second expression:
-<pre class="brush:xquery">
-copy $p := &lt;p&gt;word&lt;/p&gt;
-modify ()
-return ft:mark($p[text() contains text 'word'], 'b')</pre>
-'''Example 3''': The following expression loops through the first ten full-text results and marks the results in a second expression:
 <pre class="brush:xquery">
 let $start := 1
@@ Line 145: / Line 116: @@
 }
 </pre>
+'''Example 3''': The following expression returns {{Code|&lt;p&gt;&lt;b&gt;word&lt;/b&gt;&lt;/p&gt;}}:
+<pre class="brush:xquery">
+copy $p := &lt;p&gt;word&lt;/p&gt;
+modify ()
+return ft:mark($p[text() contains text 'word'], 'b')</pre>
 |}
 ==ft:extract==
 {| width='100%'
 |-
 | width='120' | '''Signatures'''
-|{{Func|ft:extract|$nodes as node()*|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $tag as xs:string|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $tag as xs:string, $length as xs:integer|node()*}}
+|{{Func|ft:extract|$nodes as node()*|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $name as xs:string|node()*}}<br />{{Func|ft:extract|$nodes as node()*, $name as xs:string, $length as xs:integer|node()*}}
 |-
 | '''Summary'''
-|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting {{Code|$nodes}} of a full-text index request and chops irrelevant sections of the result.<br />The default tag name of the marker element is {{Code|mark}}. An alternative tag name can be chosen via the optional {{Code|$tag}} argument.<br />The default length of the returned text is {{Code|150}} characters. An alternative length can be specified via the optional {{Code|$length}} argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.<br />For more details on this function, please have a look at [[#ft:mark|ft:mark]].
+|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting {{Code|$nodes}} of a full-text index request and chops irrelevant sections of the result.<br />The default element name of the marker element is {{Code|mark}}. An alternative element name can be chosen via the optional {{Code|$name}} argument.<br />The default length of the returned text is {{Code|150}} characters. An alternative length can be specified via the optional {{Code|$length}} argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.<br />For more details on this function, please have a look at [[#ft:mark|ft:mark]].
 |-
 | '''Examples'''
@@ Line 179: / Line 156: @@
 ==ft:score==
 {| width='100%'
 |-
@@ Line 202: / Line 180: @@
 |-
 | '''Errors'''
-|{{Error|BXDB0002|XQuery Errors#BaseX Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|BXDB0004|Database Module#Errors}} the full-text index is not available.
+|{{Error|db:open|Database Module#Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|db:no-index|Database Module#Errors}} the full-text index is not available.
+|-
+| '''Examples'''
+|Returns the number of occurrences for a single, specific index entry:
+<pre class="brush:xquery">
+let $term := ft:tokenize($term)
+return number(ft:tokens('db', $term)[. = $term]/@count)
+</pre>
 |}
 ==ft:tokenize==
 {| width='100%'
 |-
 | width='120' | '''Signatures'''
-|{{Func|ft:tokenize|$input as xs:string|xs:string*}}
+|{{Func|ft:tokenize|$string as xs:string?|xs:string*}}<br/>{{Func|ft:tokenize|$string as xs:string?, $options as map(*)?|xs:string*}}
 |-
 | '''Summary'''
-|Tokenizes the given {{Code|$input}} string, using the current default full-text options.
+|Tokenizes the given {{Code|$string}}, using the current default full-text options or the {{Code|$options}} specified as second argument, and returns a sequence with the tokenized string. The following options are available:
+* {{Code|case}}: determines how character case is processed. Allowed values are {{Code|insensitive}}, {{Code|sensitive}}, {{Code|upper}} and {{Code|lower}}. By default, search is case insensitive.
+* {{Code|diacritics}}: determines how diacritical characters are processed. Allowed values are {{Code|insensitive}} and {{Code|sensitive}}. By default, search is diacritical insensitive.
+* {{Code|stemming}}: determines is tokens are stemmed. Allowed values are {{Code|true}} and {{Code|false}}. By default, stemming is turned off.
+* {{Code|language}}: determines the language. This option is relevant for stemming tokens. All language codes are supported. The default language is {{Code|en}}.
+The {{Code|$options}} argument can be used to control full-text processing.
 |-
 | '''Examples'''
 |
-* {{Code|ft:tokenize("No Doubt")}} returns the two strings {{Code|no}} and {{Code|doubt}}.
+* <code>ft:tokenize("No Doubt")</code> returns the two strings {{Code|no}} and {{Code|doubt}}.
-* {{Code|declare ft-option using stemming; ft:tokenize("GIFTS")}} returns a single string {{Code|gift}}.
+* <code>ft:tokenize("École", map { 'diacritics': 'sensitive' })</code> returns the string {{Code|école}}.
+* <code>declare ft-option using stemming; ft:tokenize("GIFTS")</code> returns a single string {{Code|gift}}.
+|}
+==ft:normalize==
+{| width='100%'
+|-
+| width='120' | '''Signatures'''
+|{{Func|ft:normalize|$string as xs:string?|xs:string}}<br/>{{Func|ft:normalize|$string as xs:string?, $options as map(*)?|xs:string}}
+|-
+| '''Summary'''
+|Normalizes the given {{Code|$string}}, using the current default full-text options or the {{Code|$options}} specified as second argument. The function expects the same arguments as [[#ft:tokenize|ft:tokenize]].
+|-
+| '''Examples'''
+|
+* <code>ft:tokenize("Häuser am Meer", map { 'case': 'sensitive' })</code> returns the string {{Code|Hauser am Meer}}.
 |}
@@ Line 226: / Line 233: @@
 |Description
 |-
-|{{Code|BXFT0001}}
+|{{Code|options}}
 |Both wildcards and fuzzy search have been specified as search options.
 |}
 =Changelog=
+; Version 9.1
+* Updated: [[#ft:tokenize|ft:tokenize]] and [[#ft:normalize|ft:normalize]] can be called with empty sequence.
+;Version 9.0
+* Updated: error codes updated; errors now use the module namespace
+;Version 8.0
+* Added: [[#ft:contains|ft:contains]], [[#ft:normalize|ft:normalize]]
+* Updated: Options added to [[#ft:tokenize|ft:tokenize]]
+;Version 7.8
+* Added: [[#ft:contains|ft:contains]]
+* Updated: Options added to [[#ft:search|ft:search]]
 ;Version 7.7
@@ Line 243: / Line 267: @@
 * Added: [[#ft:tokens|ft:tokens]], [[#ft:tokenize|ft:tokenize]]
-[[Category:XQuery]]