Difference between revisions of "Full-Text Module"
m (Text replace - "| valign='top' | " to "| ") |
|||
Line 17: | Line 17: | ||
|<code><b>ft:search</b>($db as item(), $terms as item()*) as text()*</code><br/><code><b>ft:search</b>($db as item(), $terms as item()*, $options as item()) as text()*</code> | |<code><b>ft:search</b>($db as item(), $terms as item()*) as text()*</code><br/><code><b>ft:search</b>($db as item(), $terms as item()*, $options as item()) as text()*</code> | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Returns all text nodes from the full-text index of the [[Database Module#Database Nodes|database node]] <code>$db</code> that contain the specified {{Mono|$terms}}.<br/>The options used for building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well. | |Returns all text nodes from the full-text index of the [[Database Module#Database Nodes|database node]] <code>$db</code> that contain the specified {{Mono|$terms}}.<br/>The options used for building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well. | ||
The {{Mono|$options}} argument can be used to overwrite the default full-text options. It can be specified as | The {{Mono|$options}} argument can be used to overwrite the default full-text options. It can be specified as | ||
Line 33: | Line 33: | ||
* {{Mono|wildcards}}: turns wildcard querying on or off. Allowed values are an empty string or {{Mono|true}}, or {{Mono|false}}. By default, wildcard querying is turned off. | * {{Mono|wildcards}}: turns wildcard querying on or off. Allowed values are an empty string or {{Mono|true}}, or {{Mono|false}}. By default, wildcard querying is turned off. | ||
|- | |- | ||
− | + | | '''Errors''' | |
|'''[[Database Module#Errors|BXDB0004]]''' is raised if the full-text index is not available.<br/>'''[[XQuery Errors#BaseX Errors|BASX0022]]''' is raised if both fuzzy and wildcard querying has been selected. | |'''[[Database Module#Errors|BXDB0004]]''' is raised if the full-text index is not available.<br/>'''[[XQuery Errors#BaseX Errors|BASX0022]]''' is raised if both fuzzy and wildcard querying has been selected. | ||
|- | |- | ||
− | + | | '''Examples''' | |
| | | | ||
* <code>ft:search("DB", "QUERY")</code> returns all text nodes of the database {{Mono|DB}} that contain the term {{Mono|QUERY}}. | * <code>ft:search("DB", "QUERY")</code> returns all text nodes of the database {{Mono|DB}} that contain the term {{Mono|QUERY}}. | ||
Line 60: | Line 60: | ||
|<code><b>ft:mark</b>($nodes as node()*) as node()*</code><br /><code><b>ft:mark</b>($nodes as node()*, $tag as xs:string) as node()*</code> | |<code><b>ft:mark</b>($nodes as node()*) as node()*</code><br /><code><b>ft:mark</b>($nodes as node()*, $tag as xs:string) as node()*</code> | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Puts a marker element around the resulting <code>$nodes</code> of a full-text index request.<br />The default tag name of the marker element is <code>mark</code>. An alternative tag name can be chosen via the optional <code>$tag</code> argument.<br />Note that the XML node to be transformed must be an internal "database" node. The <code>transform</code> expression can be used to apply the method to a main-memory fragment (see example). | |Puts a marker element around the resulting <code>$nodes</code> of a full-text index request.<br />The default tag name of the marker element is <code>mark</code>. An alternative tag name can be chosen via the optional <code>$tag</code> argument.<br />Note that the XML node to be transformed must be an internal "database" node. The <code>transform</code> expression can be used to apply the method to a main-memory fragment (see example). | ||
|- | |- | ||
− | + | | '''Examples''' | |
| | | | ||
* The following query returns <code><XML><mark>hello</mark> world</XML></code>, if one text node of the database <code>DB</code> has the value "hello world": | * The following query returns <code><XML><mark>hello</mark> world</XML></code>, if one text node of the database <code>DB</code> has the value "hello world": | ||
Line 82: | Line 82: | ||
|<code><b>ft:extract</b>($nodes as node()*) as node()*</code><br /><code><b>ft:extract</b>($nodes as node()*, $tag as xs:string) as node()*</code><br /><code><b>ft:extract</b>($nodes as node()*, $tag as xs:string, $length as xs:integer) as node()*</code> | |<code><b>ft:extract</b>($nodes as node()*) as node()*</code><br /><code><b>ft:extract</b>($nodes as node()*, $tag as xs:string) as node()*</code><br /><code><b>ft:extract</b>($nodes as node()*, $tag as xs:string, $length as xs:integer) as node()*</code> | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting <code>$nodes</code> of a full-text index request and chops irrelevant sections of the result.<br />The default tag name of the marker element is <code>mark</code>. An alternative tag name can be chosen via the optional <code>$tag</code> argument.<br />The default length of the returned text is <code>150</code> characters. An alternative length can be specified via the optional <code>$length</code> argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues. | |Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting <code>$nodes</code> of a full-text index request and chops irrelevant sections of the result.<br />The default tag name of the marker element is <code>mark</code>. An alternative tag name can be chosen via the optional <code>$tag</code> argument.<br />The default length of the returned text is <code>150</code> characters. An alternative length can be specified via the optional <code>$length</code> argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues. | ||
|- | |- | ||
− | + | | '''Examples''' | |
| | | | ||
* The following query may return <code><XML>...<b>hello</b>...<XML></code> if a text node of the database <code>DB</code> contains the string "hello world": | * The following query may return <code><XML>...<b>hello</b>...<XML></code> if a text node of the database <code>DB</code> contains the string "hello world": | ||
Line 99: | Line 99: | ||
|<code><b>ft:count</b>($nodes as node()*) as xs:integer</code> | |<code><b>ft:count</b>($nodes as node()*) as xs:integer</code> | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Returns the number of occurrences of the search terms specified in a full-text expression. | |Returns the number of occurrences of the search terms specified in a full-text expression. | ||
|- | |- | ||
− | + | | '''Examples''' | |
| | | | ||
* <code>ft:count(//*[text() contains text 'QUERY'])</code> returns the <code>xs:integer</code> value <code>2</code> if a document contains two occurrences of the string "QUERY". | * <code>ft:count(//*[text() contains text 'QUERY'])</code> returns the <code>xs:integer</code> value <code>2</code> if a document contains two occurrences of the string "QUERY". | ||
Line 113: | Line 113: | ||
|<code><b>ft:score</b>($item as item()*) as xs:double*</code> | |<code><b>ft:score</b>($item as item()*) as xs:double*</code> | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Returns the score values (0.0 - 1.0) that have been attached to the specified items. <code>0</code> is returned a value if no score was attached. | |Returns the score values (0.0 - 1.0) that have been attached to the specified items. <code>0</code> is returned a value if no score was attached. | ||
|- | |- | ||
− | + | | '''Examples''' | |
| | | | ||
* <code>ft:score('a' contains text 'a')</code> returns the <code>xs:double</code> value <code>1</code>. | * <code>ft:score('a' contains text 'a')</code> returns the <code>xs:double</code> value <code>1</code>. | ||
Line 127: | Line 127: | ||
|{{Mono|<b>ft:tokens</b>($db as item()) as element(value)*}}<br/>{{Mono|<b>ft:tokens</b>($db as item(), $prefix as xs:string) as element(value)*}} | |{{Mono|<b>ft:tokens</b>($db as item()) as element(value)*}}<br/>{{Mono|<b>ft:tokens</b>($db as item(), $prefix as xs:string) as element(value)*}} | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Returns all full-text tokens stored in the index of the [[Database Module#Database Nodes|database node]] <code>$db</code>, along with their numbers of occurrences.<br/>If {{Mono|$prefix}} is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index. | |Returns all full-text tokens stored in the index of the [[Database Module#Database Nodes|database node]] <code>$db</code>, along with their numbers of occurrences.<br/>If {{Mono|$prefix}} is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index. | ||
|- | |- | ||
− | + | | '''Errors''' | |
|'''[[Database Module#Errors|BXDB0004]]''' is raised if the full-text index is not available. | |'''[[Database Module#Errors|BXDB0004]]''' is raised if the full-text index is not available. | ||
|} | |} | ||
Line 140: | Line 140: | ||
|{{Mono|<b>ft:tokenize</b>($input as xs:string) as xs:string*}} | |{{Mono|<b>ft:tokenize</b>($input as xs:string) as xs:string*}} | ||
|- | |- | ||
− | + | | '''Summary''' | |
|Tokenizes the given {{Mono|$input}} string, using the current default full-text options. | |Tokenizes the given {{Mono|$input}} string, using the current default full-text options. | ||
|- | |- | ||
− | + | | '''Examples''' | |
| | | | ||
* <code>ft:tokenize("No Doubt")</code> returns the two strings {{Mono|no}} and {{Mono|doubt}}. | * <code>ft:tokenize("No Doubt")</code> returns the two strings {{Mono|no}} and {{Mono|doubt}}. |
Revision as of 00:42, 26 May 2012
This XQuery Module extends the W3C Full Text Recommendation with some useful functions: The index can be directly accessed, full-text results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the contains text
expression, can be explicitly requested from items.
Contents
Conventions
All functions in this module are assigned to the http://basex.org/modules/ft
namespace, which is statically bound to the ft
prefix.
All errors are assigned to the http://basex.org/errors
namespace, which is statically bound to the bxerr
prefix.
Functions
ft:search
Template:Mark second argument generalized, third parameter added.
Signatures | ft:search($db as item(), $terms as item()*) as text()* ft:search($db as item(), $terms as item()*, $options as item()) as text()*
|
Summary | Returns all text nodes from the full-text index of the database node $db that contain the specified $terms .The options used for building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well. The
<options> <key>value</key> ... </options>
The following keys are supported:
|
Errors | BXDB0004 is raised if the full-text index is not available. BASX0022 is raised if both fuzzy and wildcard querying has been selected. |
Examples |
let $terms := "Hello Worlds" let $fuzzy := true() let $options := <options> <fuzzy>{ $fuzzy }</fuzzy> </options> for $db in 1 to 3 let $dbname := 'DB' || $db return ft:search($dbname, $terms, $options)/.. |
ft:mark
Signatures | ft:mark($nodes as node()*) as node()* ft:mark($nodes as node()*, $tag as xs:string) as node()*
|
Summary | Puts a marker element around the resulting $nodes of a full-text index request.The default tag name of the marker element is mark . An alternative tag name can be chosen via the optional $tag argument.Note that the XML node to be transformed must be an internal "database" node. The transform expression can be used to apply the method to a main-memory fragment (see example).
|
Examples |
ft:mark(db:open('DB')//*[text() contains text 'hello'])
copy $p := <p>word</p> modify () return ft:mark($p[text() contains text 'word'], 'b') |
ft:extract
Signatures | ft:extract($nodes as node()*) as node()* ft:extract($nodes as node()*, $tag as xs:string) as node()* ft:extract($nodes as node()*, $tag as xs:string, $length as xs:integer) as node()*
|
Summary | Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting $nodes of a full-text index request and chops irrelevant sections of the result.The default tag name of the marker element is mark . An alternative tag name can be chosen via the optional $tag argument.The default length of the returned text is 150 characters. An alternative length can be specified via the optional $length argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.
|
Examples |
ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1) |
ft:count
Signatures | ft:count($nodes as node()*) as xs:integer
|
Summary | Returns the number of occurrences of the search terms specified in a full-text expression. |
Examples |
|
ft:score
Signatures | ft:score($item as item()*) as xs:double*
|
Summary | Returns the score values (0.0 - 1.0) that have been attached to the specified items. 0 is returned a value if no score was attached.
|
Examples |
|
ft:tokens
Signatures | ft:tokens($db as item()) as element(value)* ft:tokens($db as item(), $prefix as xs:string) as element(value)*
|
Summary | Returns all full-text tokens stored in the index of the database node $db , along with their numbers of occurrences.If $prefix is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index.
|
Errors | BXDB0004 is raised if the full-text index is not available. |
ft:tokenize
Signatures | ft:tokenize($input as xs:string) as xs:string*
|
Summary | Tokenizes the given $input string, using the current default full-text options.
|
Examples |
|
Changelog
Version 7.2
- Updated: ft:search (second argument generalized, third parameter added)
Version 7.1
- Added: ft:tokens, ft:tokenize