Difference between revisions of "String Module"

From BaseX Documentation
Jump to navigation Jump to search
Line 16: Line 16:
 
| '''Summary'''
 
| '''Summary'''
 
|Computes the [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance Damerau-Levenshtein Distance] for two strings and returns a double value ({{Code|0.0}} - {{Code|1.0}}). The distance is computed as follows:<br/>
 
|Computes the [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance Damerau-Levenshtein Distance] for two strings and returns a double value ({{Code|0.0}} - {{Code|1.0}}). The distance is computed as follows:<br/>
* <code>1.0 - distance / max(length of strings)</code>
+
* <code>1.0</code> – distance / max(length of strings)</code>
 
* <code>1.0</code> is returned if the strings are equal; <code>0.0</code> is returned if the strings are too different.
 
* <code>1.0</code> is returned if the strings are equal; <code>0.0</code> is returned if the strings are too different.
 
|-
 
|-

Revision as of 17:13, 16 September 2015

This XQuery Module contains functions for string computations.

Conventions

All functions in this module and errors are assigned to the http://basex.org/modules/strings namespace, which is statically bound to the strings prefix.

Functions

strings:levenshtein

Signatures strings:levenshtein($string1 as xs:string, $string2 as xs:string) as xs:double
Summary Computes the Damerau-Levenshtein Distance for two strings and returns a double value (0.0 - 1.0). The distance is computed as follows:
  • 1.0 – distance / max(length of strings)
  • 1.0 is returned if the strings are equal; 0.0 is returned if the strings are too different.
Examples
  • strings:levenshtein("flower", "flower") returns 1
  • strings:levenshtein("flower", "lewes") returns 0.5
  • In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns 1:
let $norm := ft:normalize(?, map { 'stemming': true() })
return strings:levenshtein($norm("HOUSES"), $norm("house"))

strings:soundex

Signatures strings:soundex($string as xs:string) as xs:string
Summary Computes the Soundex value for the specified string. The algorithm can be used to find and index English words with similar pronouncation.
Examples
  • strings:soundex("Michael") returns M240
  • strings:soundex("OBrien") = strings:soundex("O'Brien") returns true

strings:cologne-phonetic

Signatures strings:cologne-phonetic($string as xs:string) as xs:string
Summary Computes the Kölner Phonetik value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be 0, the value is returned as string.
Examples
  • strings:cologne-phonetic("Michael") returns 645
  • every $s in ("Mayr", "Maier", "Meier") satisfies strings:cologne-phonetic($s) = "67" returns true

Changelog

The Module was introduced with Version 8.3.