Changes

Full-Text (edit)

Revision as of 13:34, 20 July 2022

573 bytes removed , 13:34, 20 July 2022

m

Text replacement - "db:pre(" to "db:get("

</syntaxhighlight>

~~Please note that scoring propagation was removed with {{Mark|Version 9.5}}. The following expressions will now yield {{Code|0}}:~~ ~~<syntaxhighlight lang="xquery">let $string := 'a b'return ft:score($string contains text 'a' and $string contains text 'b'),~~ ~~for $n score $s in db:open('factbook')//religions[text() contains text 'orthodox']order by $s descendingreturn $s || ': ' || $n</syntaxhighlight>~~ Scoring is ~~still~~ supported within full-text expressions, by {{Function|Full-Text|ft:search}}, and by simple predicate tests that can be rewritten to {{Function|Full-Text|ft:search}}:

return $s || ': ' || $n,

for $n score $s in db:~~open~~get('factbook')//text()[. contains text 'orthodox']

order by $s descending

return $s || ': ' || $n

</syntaxhighlight>

~~The reason for removing the scoring propagation was that the storage of scoring values required additional memory, even if scoring is not required.~~

==Thesaurus==

Fuzzy search is based on the Levenshtein distance. The maximum number of allowed errors is calculated by dividing the token length of a specified query term by 4. The query above yields two results as there is no error between the query term “house” and the text node “house”, and one error between “house” and “hous”.

A user-defined value can be adjusted globally via the {{Option|LSERROR}} option or~~, since {{Version|9.6}},~~ via an additional argument:

To enable this kind of searches, it is recommendable to:

* ~~Turn off~~ Keep ''whitespace ~~chopping~~stripping'' turned off when importing XML documents. This can be done by ~~setting~~ ensuring that {{Option|~~CHOP~~STRIPWS}} ~~to <code>OFF</code>~~is disabled. This can also be done in the GUI if a new database is created (''Database'' → ''New…'' → ''Parsing'' → ''~~Chop~~ Strip Whitespaces'').* ~~Turn off~~ Keep automatic indentation ~~by assigning <code>~~turned off. Ensure that the [[Serialization|serialization parameter]] {{Code|indent~~=no</code>~~ }} is set to ~~the~~ {{~~Option~~Code|~~SERIALIZER~~no}} ~~option~~.

A query such as <code>//p[. contains text 'real text']</code> will then match the example paragraph above. However, the full-text index will '''not''' be used in this query, so it may take a long time. The full-text index would be used for the query <code>//p[text() contains text 'real text']</code>, but this query will not find the example paragraph, because the matching text is split over two text nodes.

Note that the node structure is ignored by the full-text tokenizer: The {{Code|contains text}} expression applies all full-text operations to the ''string value'' of its left operand. As a consequence, the ~~<code>~~{{Function|Full-Text|ft:mark~~</code>~~ }} and ~~<code>~~{{Function|Full-Text|ft:extract~~</code>~~ }} functions ~~(see [[Full-Text Module|Full-Text Functions]])~~ will only yield useful results if they are applied to single text nodes, as the following example demonstrates:

</syntaxhighlight>

BaseX does '''not''' support the ''ignore option'' (<code>without content</code>) of the [https://www.w3.org/TR/xpath-full-text-10/#ftignoreoption W3C XQuery Full Text 1.0] Recommendation. If you want to ignore descendant element content, such as footnotes or other material that does not belong to the same logical text flow, you can build a second database from and exclude all information you ~~do not~~ want to ~~search~~ avoid searching for. See the following example (visit [[XQuery Update]] to learn more about updates):

let $docs := db:~~open~~get('docs')

return db:create(

'index-db',

CG

Bureaucrats, editor, reviewer, Administrators

13,550

edits

Changes

Full-Text (edit)

Revision as of 13:34, 20 July 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools