Difference between revisions of "String Module"

From BaseX Documentation
Jump to navigation Jump to search
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
{{Announce|Updated with Version 10:}} Renamed from ''Strings Module'' to ''String Module''. The namespace URI has been updated as well.
 
{{Announce|Updated with Version 10:}} Renamed from ''Strings Module'' to ''String Module''. The namespace URI has been updated as well.
  
This [[Module Library|XQuery Module]] contains functions for string computations.
+
This [[Module Library|XQuery Module]] contains functions for string operations and computations.
  
 
=Conventions=
 
=Conventions=
Line 7: Line 7:
 
All functions and errors in this module and errors are assigned to the <code><nowiki>http://basex.org/modules/string</nowiki></code> namespace, which is statically bound to the {{Code|string}} prefix.<br/>
 
All functions and errors in this module and errors are assigned to the <code><nowiki>http://basex.org/modules/string</nowiki></code> namespace, which is statically bound to the {{Code|string}} prefix.<br/>
  
=Functions=
+
=Computations=
  
 
==string:levenshtein==
 
==string:levenshtein==
  
 
{| width='100%'
 
{| width='100%'
|-
+
|- valign="top"
 
| width='120' | '''Signatures'''
 
| width='120' | '''Signatures'''
 
|{{Func|string:levenshtein|$string1 as xs:string, $string2 as xs:string|xs:double}}<br/>
 
|{{Func|string:levenshtein|$string1 as xs:string, $string2 as xs:string|xs:double}}<br/>
|-
+
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
 
|Computes the [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance Damerau-Levenshtein Distance] for two strings and returns a double value ({{Code|0.0}} - {{Code|1.0}}). The returned value is computed as follows:<br/>
 
|Computes the [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance Damerau-Levenshtein Distance] for two strings and returns a double value ({{Code|0.0}} - {{Code|1.0}}). The returned value is computed as follows:<br/>
 
* <code>1.0</code> – distance / max(length of strings)
 
* <code>1.0</code> – distance / max(length of strings)
 
* <code>1.0</code> is returned if the strings are equal; <code>0.0</code> is returned if the strings are too different.
 
* <code>1.0</code> is returned if the strings are equal; <code>0.0</code> is returned if the strings are too different.
|-
+
|- valign="top"
 
| '''Examples'''
 
| '''Examples'''
 
|
 
|
Line 35: Line 35:
  
 
{| width='100%'
 
{| width='100%'
|-
+
|- valign="top"
 
| width='120' | '''Signatures'''
 
| width='120' | '''Signatures'''
 
|{{Func|string:soundex|$string as xs:string|xs:string}}<br/>
 
|{{Func|string:soundex|$string as xs:string|xs:string}}<br/>
|-
+
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
 
|Computes the [https://en.wikipedia.org/wiki/Soundex Soundex] value for the specified string. The algorithm can be used to find and index English words with similar pronouncation.
 
|Computes the [https://en.wikipedia.org/wiki/Soundex Soundex] value for the specified string. The algorithm can be used to find and index English words with similar pronouncation.
|-
+
|- valign="top"
 
| '''Examples'''
 
| '''Examples'''
 
|
 
|
Line 51: Line 51:
  
 
{| width='100%'
 
{| width='100%'
|-
+
|- valign="top"
 
| width='120' | '''Signatures'''
 
| width='120' | '''Signatures'''
 
|{{Func|string:cologne-phonetic|$string as xs:string|xs:string}}<br/>
 
|{{Func|string:cologne-phonetic|$string as xs:string|xs:string}}<br/>
|-
+
|- valign="top"
 
| '''Summary'''
 
| '''Summary'''
 
|Computes the [https://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik Kölner Phonetik] value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be {{Code|0}}, the value is returned as string.
 
|Computes the [https://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik Kölner Phonetik] value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be {{Code|0}}, the value is returned as string.
|-
+
|- valign="top"
 
| '''Examples'''
 
| '''Examples'''
 
|
 
|
 
* <code>string:cologne-phonetic("Michael")</code> returns {{Code|645}}
 
* <code>string:cologne-phonetic("Michael")</code> returns {{Code|645}}
 
* <code>every $s in ("Mayr", "Maier", "Meier") satisfies string:cologne-phonetic($s) = "67"</code> returns {{Code|true}}
 
* <code>every $s in ("Mayr", "Maier", "Meier") satisfies string:cologne-phonetic($s) = "67"</code> returns {{Code|true}}
 +
|}
 +
 +
=Formatting=
 +
 +
{{Announce|The functions in this section have been adopted from the obsolete Output Module.}}
 +
 +
==string:format==
 +
 +
{| width='100%'
 +
|- valign="top"
 +
| width='120' | '''Signatures'''
 +
|{{Func|string:format|$format as xs:string, $items as item() ...|xs:string}}<br />
 +
|- valign="top"
 +
| '''Summary'''
 +
|Returns a formatted string. The remaining arguments specified by {{Code|$items}} are applied to the {{Code|$format}} string, according to [https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax Java’s printf syntax].
 +
|- valign="top"
 +
| '''Errors'''
 +
|{{Error|format|#Errors}} The specified format is not valid.
 +
|- valign="top"
 +
| '''Examples'''
 +
|
 +
* {{Code|string:format("%b", true())}} returns {{Code|true}}.
 +
* {{Code|string:format("%06d", 256)}} returns {{Code|000256}}.
 +
* {{Code|string:format("%e", 1234.5678)}} returns {{Code|1.234568e+03}}.
 +
|}
 +
 +
==string:cr==
 +
 +
{| width='100%'
 +
|- valign="top"
 +
| width='120' | '''Signatures'''
 +
|{{Code|'''string:cr()''' as xs:string}}
 +
|- valign="top"
 +
| '''Summary'''
 +
|Returns a single carriage return character ({{Code|&amp;#13;}}).
 +
|}
 +
 +
==string:nl==
 +
 +
{| width='100%'
 +
|- valign="top"
 +
| width='120' | '''Signatures'''
 +
|{{Code|'''string:nl()''' as xs:string}}
 +
|- valign="top"
 +
| '''Summary'''
 +
|Returns a single newline character ({{Code|&amp;#10;}}).
 +
|}
 +
 +
==string:tab==
 +
 +
{| width='100%'
 +
|- valign="top"
 +
| width='120' | '''Signatures'''
 +
|{{Code|'''string:tab()''' as xs:string}}
 +
|- valign="top"
 +
| '''Summary'''
 +
|Returns a single tabulator character ({{Code|&amp;#9;}}).
 
|}
 
|}
  
Line 68: Line 125:
 
;Version 10.0
 
;Version 10.0
 
* Updated: Renamed from ''Strings Module'' to ''String Module''. The namespace URI has been updated as well.
 
* Updated: Renamed from ''Strings Module'' to ''String Module''. The namespace URI has been updated as well.
 +
* Updated: {{Function||string:format}}, {{Function||string:cr}}, {{Function||string:nl}} and {{Function||string:tab}} adopted from the obsolete Output Module.
  
The Module was introduced with Version 8.3.
+
The Module was introduced with Version 8.3. Functions were adopted from the obsolete Utility and Output Modules.

Revision as of 13:36, 20 July 2022

Updated with Version 10: Renamed from Strings Module to String Module. The namespace URI has been updated as well.

This XQuery Module contains functions for string operations and computations.

Conventions

All functions and errors in this module and errors are assigned to the http://basex.org/modules/string namespace, which is statically bound to the string prefix.

Computations

string:levenshtein

Signatures string:levenshtein($string1 as xs:string, $string2 as xs:string) as xs:double
Summary Computes the Damerau-Levenshtein Distance for two strings and returns a double value (0.0 - 1.0). The returned value is computed as follows:
  • 1.0 – distance / max(length of strings)
  • 1.0 is returned if the strings are equal; 0.0 is returned if the strings are too different.
Examples
  • string:levenshtein("flower", "flower") returns 1
  • string:levenshtein("flower", "lewes") returns 0.5
  • In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns 1:

<syntaxhighlight lang="xquery"> let $norm := ft:normalize(?, map { 'stemming': true() }) return string:levenshtein($norm("HOUSES"), $norm("house")) </syntaxhighlight>

string:soundex

Signatures string:soundex($string as xs:string) as xs:string
Summary Computes the Soundex value for the specified string. The algorithm can be used to find and index English words with similar pronouncation.
Examples
  • string:soundex("Michael") returns M240
  • string:soundex("OBrien") = string:soundex("O'Brien") returns true

string:cologne-phonetic

Signatures string:cologne-phonetic($string as xs:string) as xs:string
Summary Computes the Kölner Phonetik value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be 0, the value is returned as string.
Examples
  • string:cologne-phonetic("Michael") returns 645
  • every $s in ("Mayr", "Maier", "Meier") satisfies string:cologne-phonetic($s) = "67" returns true

Formatting

The functions in this section have been adopted from the obsolete Output Module.

string:format

Signatures string:format($format as xs:string, $items as item() ...) as xs:string
Summary Returns a formatted string. The remaining arguments specified by $items are applied to the $format string, according to Java’s printf syntax.
Errors format: The specified format is not valid.
Examples
  • string:format("%b", true()) returns true.
  • string:format("%06d", 256) returns 000256.
  • string:format("%e", 1234.5678) returns 1.234568e+03.

string:cr

Signatures string:cr() as xs:string
Summary Returns a single carriage return character (&#13;).

string:nl

Signatures string:nl() as xs:string
Summary Returns a single newline character (&#10;).

string:tab

Signatures string:tab() as xs:string
Summary Returns a single tabulator character (&#9;).

Changelog

Version 10.0

The Module was introduced with Version 8.3. Functions were adopted from the obsolete Utility and Output Modules.