Difference between revisions of "Full-Text Module"
Line 31: | Line 31: | ||
The following options are supported (the introduction on [[Full-Text]] processing gives you equivalent expressions in the XQuery Full-Text notation): | The following options are supported (the introduction on [[Full-Text]] processing gives you equivalent expressions in the XQuery Full-Text notation): | ||
* {{Code|mode}}: determines the mode how tokens are searched. Allowed values are {{Code|any}}, {{Code|any word}}, {{Code|all}}, {{Code|all words}}, and {{Code|phrase}}. {{Code|any}} is the default search mode. | * {{Code|mode}}: determines the mode how tokens are searched. Allowed values are {{Code|any}}, {{Code|any word}}, {{Code|all}}, {{Code|all words}}, and {{Code|phrase}}. {{Code|any}} is the default search mode. | ||
− | * {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are | + | * {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are {{Code|true}} and {{Code|false}}. By default, fuzzy querying is turned off. |
− | * {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are | + | * {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are {{Code|true}} and {{Code|false}}. By default, wildcard querying is turned off. |
The following options have been added in {{Version|7.8}}: | The following options have been added in {{Version|7.8}}: | ||
* {{Code|ordered}}: requires that all tokens occur in the order in which they are specified. Allowed values are {{Code|true}} and {{Code|false}}. The default is {{Code|false}}. | * {{Code|ordered}}: requires that all tokens occur in the order in which they are specified. Allowed values are {{Code|true}} and {{Code|false}}. The default is {{Code|false}}. | ||
Line 41: | Line 41: | ||
* {{Code|window}}: sets up a window in which all tokens must be located. By default, the option is turned off. It has following sub options: | * {{Code|window}}: sets up a window in which all tokens must be located. By default, the option is turned off. It has following sub options: | ||
** {{Code|size}}: specifies the size of the window in terms of ''units''. | ** {{Code|size}}: specifies the size of the window in terms of ''units''. | ||
− | ** {{Code|unit}}: can be {{Code|sentences}}, {{Code|sentences}} | + | ** {{Code|unit}}: can be {{Code|sentences}}, {{Code|sentences}} or {{Code|paragraphs}}. The default is {{Code|words}}. |
* {{Code|distance}}: specifies the distance in which tokens must occur. By default, the option is turned off. It has following sub options: | * {{Code|distance}}: specifies the distance in which tokens must occur. By default, the option is turned off. It has following sub options: | ||
** {{Code|min}}: specifies the minimum distance in terms of ''units''. The default is {{Code|0}}. | ** {{Code|min}}: specifies the minimum distance in terms of ''units''. The default is {{Code|0}}. | ||
** {{Code|max}}: specifies the maximum distance in terms of ''units''. The default is {{Code|∞}}. | ** {{Code|max}}: specifies the maximum distance in terms of ''units''. The default is {{Code|∞}}. | ||
− | ** {{Code|unit}}: can be {{Code|words}}, {{Code|sentences}} | + | ** {{Code|unit}}: can be {{Code|words}}, {{Code|sentences}} or {{Code|paragraphs}}. The default is {{Code|words}}. |
|- | |- | ||
| '''Errors''' | | '''Errors''' |
Revision as of 21:23, 17 November 2013
This XQuery Module extends the W3C Full Text Recommendation with some useful functions: The index can be directly accessed, full-text results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the contains text
expression, can be explicitly requested from items.
Contents
Conventions
All functions in this module are assigned to the http://basex.org/modules/ft
namespace, which is statically bound to the ft
prefix.
All errors are assigned to the http://basex.org/errors
namespace, which is statically bound to the bxerr
prefix.
Functions
ft:search
Signatures | ft:search($db as xs:string, $terms as item()*) as text()* ft:search($db as xs:string, $terms as item()*, $options as item()) as text()*
|
Summary | Returns all text nodes from the full-text index of the database $db that contain the specified $terms .The options used for tokenizing the input and building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well. The
<options> <key1 value='value1'/> ... </options>
{ "key1": "value1", ... } The following options are supported (the introduction on Full-Text processing gives you equivalent expressions in the XQuery Full-Text notation):
The following options have been added in Version 7.8:
|
Errors | BXDB0002 : The addressed database does not exist or could not be opened.BXDB0004 : the full-text index is not available.BXFT0001 : the fuzzy and wildcard option cannot be both specified.
|
Examples |
ft:search("db", ("A", "B"), { "mode": "all words", "distance": { "max": "5", "unit": "words" } })
let $terms := "Hello Worlds" let $fuzzy := true() let $options := <options><fuzzy value="{ $fuzzy }"/></options> for $db in 1 to 3 let $dbname := 'DB' || $db return ft:search($dbname, $terms, $options)/.. |
ft:contains
Signatures | ft:contains($input as item()*, $terms as item()*) as xs:boolean ft:contains($input as item()*, $terms as item()*, $options as item()) as xs:boolean
|
Summary | Checks if the specified $input items contain the specified $terms .The function does the same as the Full-Text expression contains text , but options can be specified more dynamically. The $options are the same as for ft:search, and the following ones in addition:
|
Errors | BXFT0001 : the fuzzy and wildcard option cannot be both specified.
|
Examples |
ft:contains("John Doe", ("jack", "john"), { "mode": "any" })
(true(), false()) ! ft:contains("Häuser", "Haus", { 'stemming': ., 'language':'de' }) |
ft:mark
Signatures | ft:mark($nodes as node()*) as node()* ft:mark($nodes as node()*, $tag as xs:string) as node()*
|
Summary | Puts a marker element around the resulting $nodes of a full-text index request.The default tag name of the marker element is mark . An alternative tag name can be chosen via the optional $tag argument.Please note that:
|
Examples | Example 1: The following query returns <XML><mark>hello</mark> world</XML> , if one text node of the database DB has the value "hello world":
ft:mark(db:open('DB')//*[text() contains text 'hello']) Example 2: The following expression returns copy $p := <p>word</p> modify () return ft:mark($p[text() contains text 'word'], 'b') Example 3: The following expression loops through the first ten full-text results and marks the results in a second expression: let $start := 1 let $end := 10 let $term := 'welcome' for $ft in (db:open('DB')//*[text() contains text { $term }])[position() = $start to $end] return element hit { ft:mark($ft[text() contains text { $term }]) } |
ft:extract
Signatures | ft:extract($nodes as node()*) as node()* ft:extract($nodes as node()*, $tag as xs:string) as node()* ft:extract($nodes as node()*, $tag as xs:string, $length as xs:integer) as node()*
|
Summary | Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting $nodes of a full-text index request and chops irrelevant sections of the result.The default tag name of the marker element is mark . An alternative tag name can be chosen via the optional $tag argument.The default length of the returned text is 150 characters. An alternative length can be specified via the optional $length argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.For more details on this function, please have a look at ft:mark. |
Examples |
ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1) |
ft:count
Signatures | ft:count($nodes as node()*) as xs:integer
|
Summary | Returns the number of occurrences of the search terms specified in a full-text expression. |
Examples |
|
ft:score
Signatures | ft:score($item as item()*) as xs:double*
|
Summary | Returns the score values (0.0 - 1.0) that have been attached to the specified items. 0 is returned a value if no score was attached.
|
Examples |
|
ft:tokens
Signatures | ft:tokens($db as xs:string) as element(value)* ft:tokens($db as xs:string, $prefix as xs:string) as element(value)*
|
Summary | Returns all full-text tokens stored in the index of the database $db , along with their numbers of occurrences.If $prefix is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index.
|
Errors | BXDB0002 : The addressed database does not exist or could not be opened.BXDB0004 : the full-text index is not available.
|
ft:tokenize
Signatures | ft:tokenize($input as xs:string) as xs:string*
|
Summary | Tokenizes the given $input string, using the current default full-text options.
|
Examples |
|
Errors
Code | Description |
---|---|
BXFT0001
|
Both wildcards and fuzzy search have been specified as search options. |
Changelog
- Version 7.8
- Added: ft:contains
- Updated: Options added to ft:search
- Version 7.7
- Updated: the functions no longer accept Database Nodes as reference. Instead, the name of a database must now be specified.
- Version 7.2
- Updated: ft:search (second argument generalized, third parameter added)
- Version 7.1
- Added: ft:tokens, ft:tokenize