Changes

Jump to navigation Jump to search
159 bytes added ,  11:33, 2 July 2020
m
Text replacement - "[http://www.w3.org/TR/xpath" to "[https://www.w3.org/TR/xpath"
This [[Module Library|XQuery Module]] extends the [httphttps://www.w3.org/TR/xpath-full-text-10 W3C Full Text Recommendation] with some useful functions: The index can be directly accessed, full-text fulltext results can be marked with additional elements, or the relevant parts can be extracted. Moreover, the score value, which is generated by the {{Code|contains text}} expression, can be explicitly requested from items.
=Conventions=
All functions and errors in this module are assigned to the <code><nowiki>http://basex.org/modules/ft</nowiki></code> namespace, which is statically bound to the {{Code|ft}} prefix.<br/>All errors are assigned to the <code><nowiki>http://basex.org/errors</nowiki></code> namespace, which is statically bound to the {{Code|bxerr}} prefix.
=Functions=
|-
| width='120' | '''Signatures'''
|{{Func|ft:search|$db as xs:string, $terms as item()*|text()*}}<br/>{{Func|ft:search|$db as xs:string, $terms as item()*, $options as map(xs:string, item()*)?|text()*}}
|-
| '''Summary'''
|-
| '''Errors'''
|{{Error|BXDB0002db:open|XQuery ErrorsDatabase Module#BaseX Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|BXDB0004db:no-index|Database Module#Errors}} the index is not available.<br/>{{Error|BXFT0001options|#Errors}} the fuzzy and wildcard option cannot be both specified.
|-
| '''Examples'''
* Return all text nodes of the database {{Code|DB}} that contain the numbers {{Code|2010}} and {{Code|2020}}:<br/><code>ft:search("DB", ("2010", "2020"), map { 'mode': 'all' })</code>
* Return text nodes that contain the terms {{Code|A}} and {{Code|B|}} in a distance of at most 5 words:
<pre classsyntaxhighlight lang="brush:xquery">
ft:search("db", ("A", "B"), map {
"mode": "all words",
"distance": map {
"max": "5",
"unit": "words"
}
})
</presyntaxhighlight>
* Iterate over three databases and return all elements containing terms similar to {{Code|Hello World}} in the text nodes:
<pre classsyntaxhighlight lang="brush:xquery">
let $terms := "Hello Worlds"
let $fuzzy := true()
let $dbname := 'DB' || $db
return ft:search($dbname, $terms, map { 'fuzzy': $fuzzy })/..
</presyntaxhighlight>
|}
|-
| width='120' | '''Signatures'''
|{{Func|ft:contains|$input as item()*, $terms as item()*|xs:boolean}}<br/>{{Func|ft:contains|$input as item()*, $terms as item()*, $options as map(xs:xstring, item()*)?|xs:boolean}}
|-
| '''Summary'''
|-
| '''Errors'''
|{{Error|BXFT0001options|#Errors}} the fuzzy and wildcard option cannot be both specifiedoptions are conflicting.
|-
| '''Examples'''
|
* Checks if {{Code|jack}} or {{Code|john}} occurs in the input string {{Code|John Doe}}:
<pre classsyntaxhighlight lang="brush:xquery">
ft:contains("John Doe", ("jack", "john"), map { "mode": "any" })
</presyntaxhighlight>
* Calls the function with stemming turned on and off:
<pre classsyntaxhighlight lang="brush:xquery">
(true(), false()) ! ft:contains("Häuser", "Haus", map { 'stemming': ., 'language':'de' })
</presyntaxhighlight>
|}
==ft:mark==
 
{| width='100%'
|-
|-
| '''Summary'''
|Puts a marker element around the resulting {{Code|$nodes}} of a full-text index request.<br />The default name of the marker element is {{Code|mark}}. An alternative name can be chosen via the optional {{Code|$name}} argument.<br />Please note that:* the The full-text expression that computes the token positions must be specified as argument of the <code>ft:mark()</code> function, as all position information is lost in subsequent processing steps. You may need to specify more than one full-text expression if you want to use the function in a FLWOR expression, as shown in Example 2.* the XML The supplied node to be transformed must be an internal "database" nodea [[Database Module#Database Node|Database Node]]. The As shown in Example 3, {{Code|update}} or {{Code|transform}} expression can be used to apply the method utilized to convert a main-memory fragment, as shown in Example 3to the required internal representation.
|-
| '''Examples'''
|'''Example 1''': The following query returns {{Code|&lt;XML&gt;&lt;mark&gt;hello&lt;/mark&gt; world&lt;/XML&gt;}}, if one text node of the database {{Code|DB}} has the value "hello world":
<pre classsyntaxhighlight lang="brush:xquery">
ft:mark(db:open('DB')//*[text() contains text 'hello'])
</presyntaxhighlight>
'''Example 2''': The following expression loops through the first ten full-text results and marks the results in a second expression:
<pre classsyntaxhighlight lang="brush:xquery">
let $start := 1
let $end := 10
ft:mark($ft[text() contains text { $term }])
}
</presyntaxhighlight>'''Example 3''': The following expression returns {{Code|<code>&lt;p&gt;xml>hello &lt;b&gt;word&lt;/b&gt;&lt;/pxml&gt;}}</code>:<pre classsyntaxhighlight lang="brush:xquery">copy $p := &lt;p&gt;word&lt;<xml>hello world</p&gt;xml>
modify ()
return ft:mark($p[text() contains text 'word'], 'b')</presyntaxhighlight>
|}
==ft:extract==
 
{| width='100%'
|-
|-
| '''Summary'''
|Extracts and returns relevant parts of full-text results. It puts a marker element around the resulting {{Code|$nodes}} of a full-text index request and chops irrelevant sections of the result.<br />The default tag element name of the marker element is {{Code|mark}}. An alternative tag element name can be chosen via the optional {{Code|$name}} argument.<br />The default length of the returned text is {{Code|150}} characters. An alternative length can be specified via the optional {{Code|$length}} argument. Note that the effective text length may differ from the specified text due to formatting and readibility issues.<br />For more details on this function, please have a look at [[#ft:mark|ft:mark]].
|-
| '''Examples'''
|
* The following query may return {{Code|&lt;XML&gt;...&lt;b&gt;hello&lt;/b&gt;...&lt;XML&gt;}} if a text node of the database {{Code|DB}} contains the string "hello world":
<pre classsyntaxhighlight lang="brush:xquery">
ft:extract(db:open('DB')//*[text() contains text 'hello'], 'b', 1)
</presyntaxhighlight>
|}
==ft:score==
 
{| width='100%'
|-
|-
| '''Errors'''
|{{Error|BXDB0002db:open|XQuery ErrorsDatabase Module#BaseX Errors}} The addressed database does not exist or could not be opened.<br/>{{Error|BXDB0004db:no-index|Database Module#Errors}} the full-text index is not available.
|-
| '''Examples'''
|Finds Returns the number of occurrences for a single, specific index entry (the positional predicate speeds up retrieval):<pre classsyntaxhighlight lang="brush:xquery">
let $term := ft:tokenize($term)
return data(number(ft:tokens('db', $term)[. = $term])[1]/@count)</presyntaxhighlight>
|}
==ft:tokenize==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|ft:tokenize|$input string as xs:string?|xs:string*}}<br/>{{Func|ft:tokenize|$input string as xs:string?, $options as map(xs:xstring, item()*)?|xs:string*}}
|-
| '''Summary'''
|Tokenizes the given {{Code|$inputstring}} string, using the current default full-text options or the {{Code|$options}} specified as second argument, and returns a sequence with the tokenized string. The following options are available:
* {{Code|case}}: determines how character case is processed. Allowed values are {{Code|insensitive}}, {{Code|sensitive}}, {{Code|upper}} and {{Code|lower}}. By default, search is case insensitive.
* {{Code|diacritics}}: determines how diacritical characters are processed. Allowed values are {{Code|insensitive}} and {{Code|sensitive}}. By default, search is diacritical insensitive.
==ft:normalize==
 
{| width='100%'
|-
| width='120' | '''Signatures'''
|{{Func|ft:normalize|$input string as xs:string?|xs:string*}}<br/>{{Func|ft:normalize|$input string as xs:string?, $options as map(xs:xstring, item()*)?|xs:string*}}
|-
| '''Summary'''
|Normalizes the given {{Code|$inputstring}} string, using the current default full-text options or the {{Code|$options}} specified as second argument. The function expects the same arguments as [[#ft:tokenize|ft:tokenize]].
|-
| '''Examples'''
|Description
|-
|{{Code|BXFT0001options}}
|Both wildcards and fuzzy search have been specified as search options.
|}
=Changelog=
 
; Version 9.1
* Updated: [[#ft:tokenize|ft:tokenize]] and [[#ft:normalize|ft:normalize]] can be called with empty sequence.
 
;Version 9.0
 
* Updated: error codes updated; errors now use the module namespace
;Version 8.0
* Added: [[#ft:tokens|ft:tokens]], [[#ft:tokenize|ft:tokenize]]
 
[[Category:XQuery]]
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu