Difference between revisions of "String Module"

From BaseX Documentation
Jump to navigation Jump to search
m (Text replacement - "<syntaxhighlight lang="xquery">" to "<pre lang='xquery'>")
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Announce|Updated with Version 10:}} Renamed from ''Strings Module'' to ''String Module''. The namespace URI has been updated as well.
 
 
 
This [[Module Library|XQuery Module]] contains functions for string operations and computations.
 
This [[Module Library|XQuery Module]] contains functions for string operations and computations.
  
Line 13: Line 11:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
|{{Func|string:levenshtein|$string1 as xs:string, $string2 as xs:string|xs:double}}<br/>
+
|<pre>string:levenshtein(
 +
  $string1 as xs:string,
 +
  $string2 as xs:string
 +
) as xs:double</pre>
 
|- valign="top"
 
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
Line 26: Line 27:
 
* {{Code|string:levenshtein("flower", "lewes")}} returns {{Code|0.5}}
 
* {{Code|string:levenshtein("flower", "lewes")}} returns {{Code|0.5}}
 
* In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns {{Code|1}}:
 
* In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns {{Code|1}}:
<syntaxhighlight lang="xquery">
+
<pre lang='xquery'>
 
let $norm := ft:normalize(?, map { 'stemming': true() })
 
let $norm := ft:normalize(?, map { 'stemming': true() })
 
return string:levenshtein($norm("HOUSES"), $norm("house"))
 
return string:levenshtein($norm("HOUSES"), $norm("house"))
</syntaxhighlight>
+
</pre>
 
|}
 
|}
  
Line 36: Line 37:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
|{{Func|string:soundex|$string as xs:string|xs:string}}<br/>
+
|<pre>string:soundex(
 +
  $string as xs:string
 +
) as xs:string</pre>
 
|- valign="top"
 
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
Line 52: Line 55:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
|{{Func|string:cologne-phonetic|$string as xs:string|xs:string}}<br/>
+
|<pre>string:cologne-phonetic(
 +
  $string as xs:string
 +
) as xs:string</pre>
 
|- valign="top"
 
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
Line 65: Line 70:
  
 
=Formatting=
 
=Formatting=
 
{{Announce|The functions in this section have been adopted from the obsolete Output Module.}}
 
  
 
==string:format==
 
==string:format==
Line 72: Line 75:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
|{{Func|string:format|$format as xs:string, $items as item() ...|xs:string}}<br />
+
|<pre>string:format(
 +
  $pattern    as xs:string,
 +
  $values...  as item()
 +
) as xs:string</pre>
 
|- valign="top"
 
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
|Returns a formatted string. The remaining arguments specified by {{Code|$items}} are applied to the {{Code|$format}} string, according to [https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax Java’s printf syntax].
+
|Returns a formatted string. The remaining {{Code|$values}} are incorported into the {{Code|$pattern}}, according to [https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax Java’s printf syntax].
 
|- valign="top"
 
|- valign="top"
 
| '''Errors'''
 
| '''Errors'''
Line 92: Line 98:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
 
|{{Code|'''string:cr()''' as xs:string}}
 
|{{Code|'''string:cr()''' as xs:string}}
 
|- valign="top"
 
|- valign="top"
Line 103: Line 109:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
 
|{{Code|'''string:nl()''' as xs:string}}
 
|{{Code|'''string:nl()''' as xs:string}}
 
|- valign="top"
 
|- valign="top"
Line 114: Line 120:
 
{| width='100%'
 
{| width='100%'
 
|- valign="top"
 
|- valign="top"
| width='120' | '''Signatures'''
+
| width='120' | '''Signature'''
 
|{{Code|'''string:tab()''' as xs:string}}
 
|{{Code|'''string:tab()''' as xs:string}}
 
|- valign="top"
 
|- valign="top"

Latest revision as of 18:36, 1 December 2023

This XQuery Module contains functions for string operations and computations.

Conventions[edit]

All functions and errors in this module and errors are assigned to the http://basex.org/modules/string namespace, which is statically bound to the string prefix.

Computations[edit]

string:levenshtein[edit]

Signature
string:levenshtein(
  $string1  as xs:string,
  $string2  as xs:string
) as xs:double
Summary Computes the Damerau-Levenshtein Distance for two strings and returns a double value (0.0 - 1.0). The returned value is computed as follows:
  • 1.0 – distance / max(length of strings)
  • 1.0 is returned if the strings are equal; 0.0 is returned if the strings are too different.
Examples
  • string:levenshtein("flower", "flower") returns 1
  • string:levenshtein("flower", "lewes") returns 0.5
  • In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns 1:
let $norm := ft:normalize(?, map { 'stemming': true() })
return string:levenshtein($norm("HOUSES"), $norm("house"))

string:soundex[edit]

Signature
string:soundex(
  $string  as xs:string
) as xs:string
Summary Computes the Soundex value for the specified string. The algorithm can be used to find and index English words with similar pronouncation.
Examples
  • string:soundex("Michael") returns M240
  • string:soundex("OBrien") = string:soundex("O'Brien") returns true

string:cologne-phonetic[edit]

Signature
string:cologne-phonetic(
  $string  as xs:string
) as xs:string
Summary Computes the Kölner Phonetik value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be 0, the value is returned as string.
Examples
  • string:cologne-phonetic("Michael") returns 645
  • every $s in ("Mayr", "Maier", "Meier") satisfies string:cologne-phonetic($s) = "67" returns true

Formatting[edit]

string:format[edit]

Signature
string:format(
  $pattern    as xs:string,
  $values...  as item()
) as xs:string
Summary Returns a formatted string. The remaining $values are incorported into the $pattern, according to Java’s printf syntax.
Errors format: The specified format is not valid.
Examples
  • string:format("%b", true()) returns true.
  • string:format("%06d", 256) returns 000256.
  • string:format("%e", 1234.5678) returns 1.234568e+03.

string:cr[edit]

Signature string:cr() as xs:string
Summary Returns a single carriage return character (&#13;).

string:nl[edit]

Signature string:nl() as xs:string
Summary Returns a single newline character (&#10;).

string:tab[edit]

Signature string:tab() as xs:string
Summary Returns a single tabulator character (&#9;).

Changelog[edit]

Version 10.0

The Module was introduced with Version 8.3. Functions were adopted from the obsolete Utility and Output Modules.