Changes

Jump to navigation Jump to search
432 bytes added ,  13:34, 20 July 2022
m
Text replacement - "db:pre(" to "db:get("
</syntaxhighlight>
Please note that scoring propagation was removed with Scoring is supported within full-text expressions, by {{MarkFunction|Full-Text|Version 9.5ft:search}}. The following expressions will now yield , and by simple predicate tests that can be rewritten to {{CodeFunction|Full-Text|0ft:search}}:
<syntaxhighlight lang="xquery">
let $string := 'a b'return ft:score($string contains text 'a' ftand 'b'), for $n score $s in dbft:opensearch('factbook')//religions[text() contains text , 'orthodox'])order by $s descendingreturn $s|| ': ' || $n,
let for $string n score $s in db:= get('a bfactbook'return ft:score)//text($string )[. contains text 'aorthodox' and ]order by $s descendingreturn $string contains text s || 'b: ')|| $n
</syntaxhighlight>
Scoring is still supported within ==Thesaurus== One or more thesaurus files can be specified in a full-text expressions and by expression. The following query returns {{FunctionCode|Full-Text|ft:searchfalse}}:
<syntaxhighlight lang="xquery">
for $n score $s in ft:search('factbookhardware', 'orthodox')return $s, let $string := 'a b'return ft:score($string contains text 'acomputers' ftand 'b') using thesaurus default
</syntaxhighlight>
The reason for removing the scoring propagation was that the storage of scoring values required additional memory, even if scoring If a thesaurus is not required.employed…
<syntaxhighlight lang="xml"><thesaurus xmlns=Thesaurus=="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>computers</term> <synonym> <term>hardware</term> <relationship>NT</relationship> </synonym> </entry></thesaurus></syntaxhighlight>
BaseX supports full-text queries using thesauri, but it does not provide a default thesaurus. This is why queries such as…the result will be {{Code|true}}:
<syntaxhighlight lang="xquery">
'computershardware' contains text 'hardwarecomputers' using thesaurus defaultat 'thesaurus.xml'
</syntaxhighlight>
will return <code>false<Thesaurus files must comply with the [https://dev.w3.org/2007/xpath-full-text-10-test-suite/TestSuiteStagingArea/TestSources/code>thesaurus.xsd XSD Schema] of the XQFT Test Suite (but the namespace can be omitted). HoweverApart from the relationship defined in [https://www.iso.org/standard/7776.html ISO 2788] (NT: narrower team, RT: related term, if the thesaurus is specifiedetc.), then custom relationships can be used. The type of relationship and the result will level depth can be <code>true</code>specified as well:
<syntaxhighlight lang="xquery">
(: BT: find broader terms; NT means narrower term :)
'computers' contains text 'hardware'
using thesaurus at 'XQFTTS_1_0_4/TestSources/usability2x.xml'relationship 'BT' from 1 to 10 levels
</syntaxhighlight>
The format of the thesaurus files must More details can be the same as the format of the thesauri provided by found in the [https://devwww.w3.org/2007TR/xpath-full-text-10-test-suite XQuery and XPath Full Text 1.0 Test Suite]. It is an XML with structure defined by an [https://dev.w3.org/2007/xpath-full-text-10-test-suite/TestSuiteStagingArea/TestSources/thesaurus.xsd XSD Schema#ftthesaurusoption specification].
==Fuzzy Querying==
</syntaxhighlight>
Fuzzy search is based on the Levenshtein distance. The maximum number of allowed errors is calculated by dividing the token length of a specified query term by 4, preserving a minimum of 1 errors. A static error distance can be set by adjusting the {{Option|LSERROR}} option (default: <code>SET LSERROR 0</code>). The query above yields two results as there is no error between the query term “house” and the text node “house”, and one error between “house” and “hous”.
Fuzzy search is also supported by A user-defined value can be adjusted globally via the full-{{Option|LSERROR}} option or via an additional argument: <syntaxhighlight lang="xquery">//a[text() contains text index.'house' using fuzzy 3 errors]</syntaxhighlight>
=Mixed Content=
To enable this kind of searches, it is recommendable to:
* Turn off Keep ''whitespace choppingstripping'' turned off when importing XML documents. This can be done by setting ensuring that {{Option|CHOPSTRIPWS}} to <code>OFF</code>is disabled. This can also be done in the GUI if a new database is created (''Database'' → ''New…'' → ''Parsing'' → ''Chop Strip Whitespaces'').* Turn off Keep automatic indentation by assigning <code>turned off. Ensure that the [[Serialization|serialization parameter]] {{Code|indent=no</code> }} is set to the {{OptionCode|SERIALIZERno}} option.
A query such as <code>//p[. contains text 'real text']</code> will then match the example paragraph above. However, the full-text index will '''not''' be used in this query, so it may take a long time. The full-text index would be used for the query <code>//p[text() contains text 'real text']</code>, but this query will not find the example paragraph, because the matching text is split over two text nodes.
Note that the node structure is ignored by the full-text tokenizer: The {{Code|contains text}} expression applies all full-text operations to the ''string value'' of its left operand. As a consequence, the <code>{{Function|Full-Text|ft:mark</code> }} and <code>{{Function|Full-Text|ft:extract</code> }} functions (see [[Full-Text Module|Full-Text Functions]]) will only yield useful results if they are applied to single text nodes, as the following example demonstrates:
<syntaxhighlight lang="xquery">
</syntaxhighlight>
BaseX does '''not''' support the ''ignore option'' (<code>without content</code>) of the [https://www.w3.org/TR/xpath-full-text-10/#ftignoreoption W3C XQuery Full Text 1.0] Recommendation. If you want to ignore descendant element content, such as footnotes or other material that does not belong to the same logical text flow, you can build a second database from and exclude all information you do not want to search avoid searching for. See the following example (visit [[XQuery Update]] to learn more about updates):
<syntaxhighlight lang="xquery">
let $docs := db:openget('docs')
return db:create(
'index-db',
=Changelog=
; Version 9.26* Updated:[[#Fuzzy_Querying|Fuzzy Querying]]: Specify Levenshtein error
; Version 9.5:
* Removed: Scoring propagation.
; Version 9.2:
 
* Added: Arabic stemmer.
; Version 8.0:
 
* Updated: [[#Scoring|Scores]] will be propagated by the {{Code|and}} and {{Code|or}} expressions and in predicates.
; Version 7.7:
 
* Added: [[#Collations|Collations]] support.
; Version 7.3:
 
* Removed: Trie index, which was specialized on wildcard queries. The fuzzy index now supports both wildcard and fuzzy queries.
* Removed: TF/IDF scoring was discarded in favor of the internal scoring model.
Bureaucrats, editor, reviewer, Administrators
13,551

edits

Navigation menu