Changes

Jump to navigation Jump to search
5 bytes removed ,  13:34, 20 July 2022
m
Text replacement - "db:pre(" to "db:get("
</syntaxhighlight>
Please note that scoring propagation was removed with {{Mark|Version 9.5}}. The following expressions will now yield {{Code|0}}: <syntaxhighlight lang="xquery">let $string := 'a b'return ft:score($string contains text 'a' and $string contains text 'b'), for $n score $s in db:open('factbook')//religions[text() contains text 'orthodox']order by $s descendingreturn $s || ': ' || $n</syntaxhighlight> Scoring is still supported within full-text expressions, by {{Function|Full-Text|ft:search}}, and by simple predicate tests that can be rewritten to {{Function|Full-Text|ft:search}}:
<syntaxhighlight lang="xquery">
return $s || ': ' || $n,
for $n score $s in db:openget('factbook')//text()[. contains text 'orthodox']
order by $s descending
return $s || ': ' || $n
</syntaxhighlight>
 
The reason for removing the scoring propagation was that the storage of scoring values required additional memory, even if scoring is not required.
==Thesaurus==
BaseX supports One or more thesaurus files can be specified in a full-text queries using thesauri, but it does not provide a default thesaurusexpression. This is why queries such asThe following query returns {{Code|false}}:
<syntaxhighlight lang="xquery">
'computershardware' contains text 'hardwarecomputers'
using thesaurus default
</syntaxhighlight>
will return If a thesaurus is employed… <syntaxhighlight lang="xml"><thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>computers</term> <synonym> <term>hardware</term> <relationship>NT</relationship> <code/synonym>false </codeentry>. However, if the </thesaurus is specified, then the ></syntaxhighlight> …the result will be {{Code|true}}: <codesyntaxhighlight lang="xquery">true'hardware' contains text 'computers' using thesaurus at 'thesaurus.xml'</codesyntaxhighlightThesaurus files must comply with the [https://dev.w3.org/2007/xpath-full-text-10-test-suite/TestSuiteStagingArea/TestSources/thesaurus.xsd XSD Schema] of the XQFT Test Suite (but the namespace can be omitted). Apart from the relationship defined in [https://www.iso.org/standard/7776.html ISO 2788] (NT: narrower team, RT: related term, etc.), custom relationships can be used. The type of relationship and the level depth can be specified as well:
<syntaxhighlight lang="xquery">
(: BT: find broader terms; NT means narrower term :)
'computers' contains text 'hardware'
using thesaurus at 'XQFTTS_1_0_4/TestSources/usability2x.xml'relationship 'BT' from 1 to 10 levels
</syntaxhighlight>
The format of the thesaurus files must More details can be the same as the format of the thesauri provided by found in the [https://devwww.w3.org/2007TR/xpath-full-text-10-test-suite XQuery and XPath Full Text 1.0 Test Suite]. It is an XML with structure defined by an [https://dev.w3.org/2007/xpath-full-text-10-test-suite/TestSuiteStagingArea/TestSources/thesaurus.xsd XSD Schema#ftthesaurusoption specification].
==Fuzzy Querying==
</syntaxhighlight>
Fuzzy search is based on the Levenshtein distance. The maximum number of allowed errors is calculated by dividing the token length of a specified query term by 4, preserving a minimum of 1 errors. The query above yields two results as there is no error between the query term “house” and the text node “house”, and one error between “house” and “hous”.
A user-defined value can be adjusted globally via the {{Option|LSERROR}} option or, since {{Version|9.6}}, via an additional argument:
<syntaxhighlight lang="xquery">
To enable this kind of searches, it is recommendable to:
* Turn off Keep ''whitespace choppingstripping'' turned off when importing XML documents. This can be done by setting ensuring that {{Option|CHOPSTRIPWS}} to <code>OFF</code>is disabled. This can also be done in the GUI if a new database is created (''Database'' → ''New…'' → ''Parsing'' → ''Chop Strip Whitespaces'').* Turn off Keep automatic indentation by assigning <code>turned off. Ensure that the [[Serialization|serialization parameter]] {{Code|indent=no</code> }} is set to the {{OptionCode|SERIALIZERno}} option.
A query such as <code>//p[. contains text 'real text']</code> will then match the example paragraph above. However, the full-text index will '''not''' be used in this query, so it may take a long time. The full-text index would be used for the query <code>//p[text() contains text 'real text']</code>, but this query will not find the example paragraph, because the matching text is split over two text nodes.
Note that the node structure is ignored by the full-text tokenizer: The {{Code|contains text}} expression applies all full-text operations to the ''string value'' of its left operand. As a consequence, the <code>{{Function|Full-Text|ft:mark</code> }} and <code>{{Function|Full-Text|ft:extract</code> }} functions (see [[Full-Text Module|Full-Text Functions]]) will only yield useful results if they are applied to single text nodes, as the following example demonstrates:
<syntaxhighlight lang="xquery">
</syntaxhighlight>
BaseX does '''not''' support the ''ignore option'' (<code>without content</code>) of the [https://www.w3.org/TR/xpath-full-text-10/#ftignoreoption W3C XQuery Full Text 1.0] Recommendation. If you want to ignore descendant element content, such as footnotes or other material that does not belong to the same logical text flow, you can build a second database from and exclude all information you do not want to search avoid searching for. See the following example (visit [[XQuery Update]] to learn more about updates):
<syntaxhighlight lang="xquery">
let $docs := db:openget('docs')
return db:create(
'index-db',
Bureaucrats, editor, reviewer, Administrators
13,550

edits

Navigation menu