Changes

Jump to navigation Jump to search
9,646 bytes added ,  12:33, 2 July 2020
m
Text replacement - "[http://www.w3.org/TR/xpath" to "[https://www.w3.org/TR/xpath"
This module [[Module Library|XQuery Module]] extends the [httphttps://www.w3.org/TR/xpath-full-text-10 W3C Full Text Recommendation] with some useful [[Querying#Functions|XQuery Functions]]functions: The index can be directly accessed, full-text fulltext results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the <code>{{Code|contains text</code> }} expression, can be explicitly requested from items.  =Conventions= All functions and errors in this module are introduced with assigned to the <code>ft:</code> prefix, which is linked to the <codenowiki>http://www.basex.org/modules/ft</nowiki></code> namespace, which is statically bound to the {{Code|ft}} prefix.<br/> =Functions=
==ft:search==
 {|width='100%'|-| width='120' | '''Signatures'''|{{Func|ft:search|$db as xs:string, $terms as item()*|text()*}}<br/>{{Func|ft:search|$db as xs:string, $terms as item()*, $options as map(*)?|text()*}}|-| '''Summary'''|Returns all text nodes from the full-text index of the database {{Code|$db}} that contain the specified {{Code|$terms}}.<br/>The options used for tokenizing the input and building the full-text will also be applied to the search terms. As an example, if the index terms have been stemmed, the search string will be stemmed as well.The {{Code|$options}} argument can be used to control full-text processing. The following options are supported (the introduction on [[Full-Text]] processing gives you equivalent expressions in the XQuery Full-Text notation):* {{Code|mode}}: determines the mode how tokens are searched. Allowed values are {{Code|any}}, {{Code|any word}}, {{Code|all}}, {{Code|all words}}, and {{Code|phrase}}. {{Code|any}} is the default search mode.* {{Code|fuzzy}}: turns fuzzy querying on or off. Allowed values are {{Code|true}} and {{Code|false}}. By default, fuzzy querying is turned off.* {{Code|wildcards}}: turns wildcard querying on or off. Allowed values are {{Code|true}} and {{Code|false}}. By default, wildcard querying is turned off.* {{Code|ordered}}: requires that all tokens occur in the order in which they are specified. Allowed values are {{Code|true}} and {{Code|false}}. The default is {{Code|false}}.* {{Code|content}}: specifies that the matched tokens need to occur at the beginning or end of a searched string, or need to cover the entire string. Allowed values are {{Code|start}}, {{Code|end}}, and {{Code|entire}}. By default, the option is turned off.* {{Code|scope}}: defines the scope in which tokens must be located. The option has following sub options:** {{Code|same}}: can be set to {{Code|true}} or {{Code|false}}. It specifies if tokens need to occur in the same or different units.** {{Code|unit}}: can be {{Code|sentence}} or {{Code|paragraph}}. It specifies the unit for finding tokens.* {{Code|window}}: sets up a window in which all tokens must be located. By default, the option is turned off. It has following sub options:** {{Code|size}}: specifies the size of the window in terms of ''units''.** {{Code|unit}}: can be {{Code|sentences}}, {{Code|sentences}} or {{Code|paragraphs}}. The default is {{Code|words}}.* {{Code|distance}}: specifies the distance in which tokens must occur. By default, the option is turned off. It has following sub options:** {{Code|min}}: specifies the minimum distance in terms of ''units''. The default is {{Code|0}}.** {{Code|max}}: specifies the maximum distance in terms of ''units''. The default is {{Code|∞}}.** {{Code|unit}}: can be {{Code|words}}, {{Code|sentences}} or {{Code|paragraphs}}. The default is {{Code|words}}.|-| '''Errors'''|{{Error|db:open|Database Module#Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|db:no-index|Database Module#Errors}} the index is not available.<br/>{{Error|options|#Errors}} the fuzzy and wildcard option cannot be both specified.|-| '''Examples'''|* {{Code|ft:search("DB", "QUERY")}}: Return all text nodes of the database {{Code|DB}} that contain the term {{Code|QUERY}}.* Return all text nodes of the database {{Code|DB}} that contain the numbers {{Code|2010}} and {{Code|2020}}:<br/><code>ft:search("DB", ("2010", "2020"), map { 'mode': 'all' })</code>* Return text nodes that contain the terms {{Code|A}} and {{Code|B|}} in a distance of at most 5 words:<syntaxhighlight lang="xquery">ft:search("db", ("A", "B"), map { "mode": "all words", "distance": map { "max": "5", "unit": "words" }})</syntaxhighlight>* Iterate over three databases and return all elements containing terms similar to {{Code|Hello World}} in the text nodes:<syntaxhighlight lang="xquery">let $terms := "Hello Worlds"let $fuzzy := true()for $db in 1 to 3let $dbname := 'DB' || $dbreturn ft:search($dbname, $terms, map { 'fuzzy': $fuzzy })/..</syntaxhighlight>|} ==ft:contains== {| width='100%'
|-
| valign='top' width='90120' | '''Signatures'''|<code><b>{{Func|ft:searchcontains|$input as item()*, $terms as item()*|xs:boolean}}<br/b>({{Func|ft:contains|$node input as nodeitem()*, $text terms as xs:stringitem() *, $options as textmap(*)</code><br />?|xs:boolean}}
|-
| valign='top' | '''Summary'''|Performs a full-text index request on Checks if the specified database node and returns all text nodes that {{Code|$input}} items contain the string <code>specified {{Code|$textterms}}.<br/code>. The index fullfunction does the same as the [[Full-Text]] expression {{Code|contains text }}, but options can be specified more dynamically. The {{Code|$options }} are used the same as for searching[[#ft:search|ft:search]], and the following ones in addition:* {{Code|case}}: determines how character case is processed. Allowed values are {{Code|insensitive}}, {{Code|sensitive}}, {{Code|upper}} and {{Code|lower}}. By default, isearch is case insensitive.e* {{Code|diacritics}}: determines how diacritical characters are processed.Allowed values are {{Code|insensitive}} and {{Code|sensitive}}. By default, if the index terms were search is diacritical insensitive.* {{Code|stemming}}: determines is tokens are stemmed. Allowed values are {{Code|true}} and {{Code|false}}. By default, stemming is turned off.* {{Code|language}}: determines the search string will be stemmed as welllanguage. This option is relevant for stemming tokens. All language codes are supported. The default language is {{Code|en}}.<br />
|-
| valign='top' | '''ExamplesErrors'''|The expression <code>ft:search(., "QUERY")</code> returns all text nodes of the currently opened database that contain the string "QUERY"{{Error|options|#Errors}} specified options are conflicting.<br />
|-
| valign='top' | '''ErrorsExamples'''|* Checks if {{Code|jack}} or {{Code|john}} occurs in the input string {{Code|John Doe}}:<bsyntaxhighlight lang="xquery">[[XQuery Errors#BaseX Errors ft:contains(BASX"John Doe", ("jack", "john"), map { "mode": "any" })|BASX0002]]</bsyntaxhighlight> is raised if * Calls the context item does not represent a database nodefunction with stemming turned on and off:<syntaxhighlight lang="xquery">(true(), false()) ! ft:contains("Häuser", "Haus", map { 'stemming': ., 'language':'de' })<br /syntaxhighlight>
|}
==ft:mark==
 {|width='100%'
|-
| valign='top' width='90120' | '''Signatures'''|<code><b>{{Func|ft:mark</b>(|$nodes as node()*) as |node()*</code>}}<br /><code><b>{{Func|ft:mark</b>(|$nodes as node()*, $tag name as xs:string) as |node()*</code><br />}}
|-
| valign='top' | '''Summary'''|Puts a marker element around the resulting <code>{{Code|$nodes</code> }} of a full-text index request.<br />The default tag name of the marker element is <code>{{Code|mark</code>}}. An alternative tag name can be chosen via the optional <code>{{Code|$tag</code> name}} argument.<br />Note Please note that :* The full-text expression that computes the XML node to be transformed token positions must be an internal "database" node. The specified as argument of the <code>transformft:mark()</code> function, as all position information is lost in subsequent processing steps. You may need to specify more than one full-text expression if you want to use the function in a FLWOR expression , as shown in Example 2.* The supplied node must be a [[Database Module#Database Node|Database Node]]. As shown in Example 3, {{Code|update}} or {{Code|transform}} can be used to apply the method utilized to convert a main-memory fragment (see example)to the required internal representation.<br />
|-
| valign='top' 'Examples'''| '''ExamplesExample 1'''|: The following query returns <code>{{Code|&lt;XML&gt;&lt;mark&gt;hello&lt;/mark&gt; world&lt;/XML&gt;</code>}}, if one text node of the database <code>{{Code|DB</code> }} has the value "hello world":<br /syntaxhighlight lang="xquery"><code>ft:mark(db:open('DB')//*[text() contains text 'hello'])</codesyntaxhighlight>'''Example 2''': The following expression loops through the first ten full-text results and marks the results in a second expression:<syntaxhighlight lang="xquery">let $start := 1let $end := 10let $term := 'welcome'for $ft in (db:open('DB')//*[text() contains text { $term }])[position() = $start to $end]return element hit { ft:mark($ft[text() contains text { $term }])}<br /syntaxhighlight>'''Example 3''': The following expression returns <code>&lt;p&gt;xml>hello &lt;b&gt;word&lt;/b&gt;&lt;/pxml&gt;</code>:<br /syntaxhighlight lang="xquery"><code>copy $p := &lt;p&gt;word&lt;/p&gt;</codexml>hello world<br /xml><code>modify ()</code><br /><code>return ft:mark($p[text() contains text 'word'], 'b')</code><br />|-| valign='top' | '''Errors'''|<b>[[XQuery Errors#BaseX Errors (BASX)|BASX0002]]</b> is raised if the context item does not represent a database node.<br /><b>[[XQuery Errors#Functions Errors (FOAR, FOCA, FOCH, FODC, FODF, FODT, FOER, FOFD, FONS, FORG, FORX)|FOCA0002]]</b> is raised if <code>$name</code> is no valid QName.<br /syntaxhighlight>
|}
==ft:extract==
 {|width='100%'
|-
| valign='top' width='90120' | '''Signatures'''|<code><b>{{Func|ft:extract</b>(|$nodes as node()*) as |node()*</code>}}<br /><code><b>{{Func|ft:extract</b>(|$nodes as node()*, $tag name as xs:string) as |node()*</code>}}<br /><code><b>{{Func|ft:extract</b>(|$nodes as node()*, $tag name as xs:string, $length as xs:integer) as |node()*</code><br />}}
|-
| valign='top' | '''Summary'''|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting <code>{{Code|$nodes</code> }} of a full-text index request and chops irrelevant sections of the result.<br />The default tag element name of the marker element is <code>{{Code|mark</code>}}. An alternative tag element name can be chosen via the optional <code>{{Code|$tag</code> name}} argument.<br />The default length of the returned text is <code>{{Code|150</code> }} characters. An alternative length can be specified via the optional <code>{{Code|$length</code> }} argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.<br />For more details on this function, please have a look at [[#ft:mark|ft:mark]].
|-
| valign='top' | '''Examples'''|* The following query may return <code>{{Code|&lt;XML&gt;...&lt;b&gt;hello&lt;/b&gt;...&lt;XML&gt;</code> }} if a text node of the database <code>{{Code|DB</code> }} contains the string "hello world":<br /syntaxhighlight lang="xquery"><code>ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1)</code><br /syntaxhighlight>|} ==ft:count=={| width='100%'|-| width='120' | '''Signatures'''|{{Func|ft:count|$nodes as node()*|xs:integer}}
|-
| valign='top' 'Summary'''|Returns the number of occurrences of the search terms specified in a full-text expression.|-| '''ErrorsExamples'''|<b>[[XQuery Errors#BaseX Errors * {{Code|ft:count(BASX)|BASX0002]]</b> is raised if the context item does not represent a database node.<br /><b>[*[XQuery Errors#Functions Errors text(FOAR, FOCA, FOCH, FODC, FODF, FODT, FOER, FOFD, FONS, FORG, FORX)contains text 'QUERY'])}} returns the {{Code|FOCA0002]]</b> is raised xs:integer}} value {{Code|2}} if <code>$name</code> is no valid QNamea document contains two occurrences of the string "QUERY".<br />
|}
==ft:score==
 {|width='100%'|-| width='120' | '''Signatures'''|{{Func|ft:score|$item as item()*|xs:double*}}|-| '''Summary'''|Returns the score values (0.0 - 1.0) that have been attached to the specified items. {{Code|0}} is returned a value if no score was attached.|-| '''Examples'''|* {{Code|ft:score('a' contains text 'a')}} returns the {{Code|xs:double}} value {{Code|1}}.|} ==ft:tokens=={| width='100%'|-| width='120' | '''Signatures'''|{{Func|ft:tokens|$db as xs:string|element(value)*}}<br/>{{Func|ft:tokens|$db as xs:string, $prefix as xs:string|element(value)*}}|-| '''Summary'''|Returns all full-text tokens stored in the index of the database {{Code|$db}}, along with their numbers of occurrences.<br/>If {{Code|$prefix}} is specified, the returned nodes will be refined to the strings starting with that prefix. The prefix will be tokenized according to the full-text used for creating the index.|-| '''Errors'''|{{Error|db:open|Database Module#Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|db:no-index|Database Module#Errors}} the full-text index is not available.|-| '''Examples'''|Returns the number of occurrences for a single, specific index entry:<syntaxhighlight lang="xquery">let $term := ft:tokenize($term)return number(ft:tokens('db', $term)[. = $term]/@count)</syntaxhighlight>|} ==ft:tokenize== {| width='100%'|-| width='120' | '''Signatures'''|{{Func|ft:tokenize|$string as xs:string?|xs:string*}}<br/>{{Func|ft:tokenize|$string as xs:string?, $options as map(*)?|xs:string*}}|-| '''Summary'''|Tokenizes the given {{Code|$string}}, using the current default full-text options or the {{Code|$options}} specified as second argument, and returns a sequence with the tokenized string. The following options are available:* {{Code|case}}: determines how character case is processed. Allowed values are {{Code|insensitive}}, {{Code|sensitive}}, {{Code|upper}} and {{Code|lower}}. By default, search is case insensitive.* {{Code|diacritics}}: determines how diacritical characters are processed. Allowed values are {{Code|insensitive}} and {{Code|sensitive}}. By default, search is diacritical insensitive.* {{Code|stemming}}: determines is tokens are stemmed. Allowed values are {{Code|true}} and {{Code|false}}. By default, stemming is turned off.* {{Code|language}}: determines the language. This option is relevant for stemming tokens. All language codes are supported. The default language is {{Code|en}}.The {{Code|$options}} argument can be used to control full-text processing.|-| '''Examples'''|* <code>ft:tokenize("No Doubt")</code> returns the two strings {{Code|no}} and {{Code|doubt}}.* <code>ft:tokenize("École", map { 'diacritics': 'sensitive' })</code> returns the string {{Code|école}}.* <code>declare ft-option using stemming; ft:tokenize("GIFTS")</code> returns a single string {{Code|gift}}.|} ==ft:normalize== {| width='100%'
|-
| valign='top' width='90120' | '''Signatures'''|<code><b>{{Func|ft:scorenormalize|$string as xs:string?|xs:string}}<br/b>({{Func|ft:normalize|$string as xs:string?, $item options as itemmap()*) as ?|xs:double*</code><br />string}}
|-
| valign='top' | '''Summary'''|Returns Normalizes the given {{Code|$string}}, using the score values (0.0 current default full- 1.0) that have been attached to text options or the {{Code|$options}} specified itemsas second argument. <code>0</code> is returned a value if no score was attachedThe function expects the same arguments as [[#ft:tokenize|ft:tokenize]].<br />
|-
| valign='top' | '''Examples'''|The expression * <code>ft:scoretokenize("Häuser am Meer", map { 'acase' contains text : 'asensitive'})</code> returns the <code>xs:double</code> value <code>1</code>string {{Code|Hauser am Meer}}.<br />
|}
=Errors= {| class="wikitable" width="100%"! width="110"|Code|Description|-|{{Code|options}}|Both wildcards and fuzzy search have been specified as search options.|} =Changelog= ; Version 9.1* Updated: [[#ft:tokenize|ft:tokenize]] and [[#ft:normalize|ft:normalize]] can be called with empty sequence. ;Version 9.0 * Updated: error codes updated; errors now use the module namespace ;Version 8.0 * Added: [[#ft:contains|ft:contains]], [[#ft:normalize|ft:normalize]]* Updated: Options added to [[#ft:tokenize|ft:tokenize]] ;Version 7.8 * Added: [[#ft:contains|ft:contains]]* Updated: Options added to [[#ft:search|ft:search]] ;Version 7.7 * Updated: the functions no longer accept [[Database Module#Database Nodes|Database Nodes]] as reference. Instead, the name of a database must now be specified. ;Version 7.2 * Updated: [[#ft:search|ft:search]] (second argument generalized, third parameter added) ;Version 7.1 * Added: [[#ft:tokens|ft:tokens]], [[Category#ft:tokenize|ft:XQuerytokenize]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu