Difference between revisions of "String Module"
Jump to navigation
Jump to search
m (Text replacement - "<syntaxhighlight lang="xquery">" to "<pre lang='xquery'>") |
|||
(21 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | This [[Module Library|XQuery Module]] contains functions for string computations. | + | This [[Module Library|XQuery Module]] contains functions for string operations and computations. |
=Conventions= | =Conventions= | ||
− | All functions in this module and errors are assigned to the <code><nowiki>http://basex.org/modules/ | + | All functions and errors in this module and errors are assigned to the <code><nowiki>http://basex.org/modules/string</nowiki></code> namespace, which is statically bound to the {{Code|string}} prefix.<br/> |
− | = | + | =Computations= |
− | == | + | ==string:levenshtein== |
{| width='100%' | {| width='100%' | ||
− | |- | + | |- valign="top" |
− | | width='120' | ''' | + | | width='120' | '''Signature''' |
− | | | + | |<pre>string:levenshtein( |
− | |- | + | $string1 as xs:string, |
+ | $string2 as xs:string | ||
+ | ) as xs:double</pre> | ||
+ | |- valign="top" | ||
| '''Summary''' | | '''Summary''' | ||
− | |Computes the [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance Damerau-Levenshtein Distance] for two strings and returns a double value ({{Code|0.0}} - {{Code|1.0}}). The | + | |Computes the [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance Damerau-Levenshtein Distance] for two strings and returns a double value ({{Code|0.0}} - {{Code|1.0}}). The returned value is computed as follows:<br/> |
− | * <code>1.0 | + | * <code>1.0</code> – distance / max(length of strings) |
* <code>1.0</code> is returned if the strings are equal; <code>0.0</code> is returned if the strings are too different. | * <code>1.0</code> is returned if the strings are equal; <code>0.0</code> is returned if the strings are too different. | ||
− | |- | + | |- valign="top" |
| '''Examples''' | | '''Examples''' | ||
| | | | ||
− | * {{Code| | + | * {{Code|string:levenshtein("flower", "flower")}} returns {{Code|1}} |
− | * {{Code| | + | * {{Code|string:levenshtein("flower", "lewes")}} returns {{Code|0.5}} |
* In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns {{Code|1}}: | * In the following query, the input is first normalized (words are stemmed, converted to lower case, and diacritics are removed). It returns {{Code|1}}: | ||
− | <pre | + | <pre lang='xquery'> |
let $norm := ft:normalize(?, map { 'stemming': true() }) | let $norm := ft:normalize(?, map { 'stemming': true() }) | ||
− | return | + | return string:levenshtein($norm("HOUSES"), $norm("house")) |
</pre> | </pre> | ||
|} | |} | ||
− | == | + | ==string:soundex== |
{| width='100%' | {| width='100%' | ||
− | |- | + | |- valign="top" |
− | | width='120' | ''' | + | | width='120' | '''Signature''' |
− | | | + | |<pre>string:soundex( |
− | |- | + | $string as xs:string |
+ | ) as xs:string</pre> | ||
+ | |- valign="top" | ||
| '''Summary''' | | '''Summary''' | ||
|Computes the [https://en.wikipedia.org/wiki/Soundex Soundex] value for the specified string. The algorithm can be used to find and index English words with similar pronouncation. | |Computes the [https://en.wikipedia.org/wiki/Soundex Soundex] value for the specified string. The algorithm can be used to find and index English words with similar pronouncation. | ||
− | |- | + | |- valign="top" |
| '''Examples''' | | '''Examples''' | ||
| | | | ||
− | * <code> | + | * <code>string:soundex("Michael")</code> returns {{Code|M240}} |
− | * <code> | + | * <code>string:soundex("OBrien") = string:soundex("O'Brien")</code> returns {{Code|true}} |
|} | |} | ||
− | == | + | ==string:cologne-phonetic== |
{| width='100%' | {| width='100%' | ||
− | |- | + | |- valign="top" |
− | | width='120' | ''' | + | | width='120' | '''Signature''' |
− | | | + | |<pre>string:cologne-phonetic( |
− | |- | + | $string as xs:string |
+ | ) as xs:string</pre> | ||
+ | |- valign="top" | ||
| '''Summary''' | | '''Summary''' | ||
− | |Computes the [https://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik Kölner Phonetik] value for the specified string | + | |Computes the [https://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik Kölner Phonetik] value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be {{Code|0}}, the value is returned as string. |
− | |- | + | |- valign="top" |
| '''Examples''' | | '''Examples''' | ||
| | | | ||
− | * <code> | + | * <code>string:cologne-phonetic("Michael")</code> returns {{Code|645}} |
− | * <code>every $s in ("Mayr", "Maier", "Meier") satisfies | + | * <code>every $s in ("Mayr", "Maier", "Meier") satisfies string:cologne-phonetic($s) = "67"</code> returns {{Code|true}} |
+ | |} | ||
+ | |||
+ | =Formatting= | ||
+ | |||
+ | ==string:format== | ||
+ | |||
+ | {| width='100%' | ||
+ | |- valign="top" | ||
+ | | width='120' | '''Signature''' | ||
+ | |<pre>string:format( | ||
+ | $pattern as xs:string, | ||
+ | $values... as item() | ||
+ | ) as xs:string</pre> | ||
+ | |- valign="top" | ||
+ | | '''Summary''' | ||
+ | |Returns a formatted string. The remaining {{Code|$values}} are incorported into the {{Code|$pattern}}, according to [https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax Java’s printf syntax]. | ||
+ | |- valign="top" | ||
+ | | '''Errors''' | ||
+ | |{{Error|format|#Errors}} The specified format is not valid. | ||
+ | |- valign="top" | ||
+ | | '''Examples''' | ||
+ | | | ||
+ | * {{Code|string:format("%b", true())}} returns {{Code|true}}. | ||
+ | * {{Code|string:format("%06d", 256)}} returns {{Code|000256}}. | ||
+ | * {{Code|string:format("%e", 1234.5678)}} returns {{Code|1.234568e+03}}. | ||
+ | |} | ||
+ | |||
+ | ==string:cr== | ||
+ | |||
+ | {| width='100%' | ||
+ | |- valign="top" | ||
+ | | width='120' | '''Signature''' | ||
+ | |{{Code|'''string:cr()''' as xs:string}} | ||
+ | |- valign="top" | ||
+ | | '''Summary''' | ||
+ | |Returns a single carriage return character ({{Code|&#13;}}). | ||
+ | |} | ||
+ | |||
+ | ==string:nl== | ||
+ | |||
+ | {| width='100%' | ||
+ | |- valign="top" | ||
+ | | width='120' | '''Signature''' | ||
+ | |{{Code|'''string:nl()''' as xs:string}} | ||
+ | |- valign="top" | ||
+ | | '''Summary''' | ||
+ | |Returns a single newline character ({{Code|&#10;}}). | ||
+ | |} | ||
+ | |||
+ | ==string:tab== | ||
+ | |||
+ | {| width='100%' | ||
+ | |- valign="top" | ||
+ | | width='120' | '''Signature''' | ||
+ | |{{Code|'''string:tab()''' as xs:string}} | ||
+ | |- valign="top" | ||
+ | | '''Summary''' | ||
+ | |Returns a single tabulator character ({{Code|&#9;}}). | ||
|} | |} | ||
=Changelog= | =Changelog= | ||
− | The Module | + | ;Version 10.0 |
+ | * Updated: Renamed from ''Strings Module'' to ''String Module''. The namespace URI has been updated as well. | ||
+ | * Updated: {{Function||string:format}}, {{Function||string:cr}}, {{Function||string:nl}} and {{Function||string:tab}} adopted from the obsolete Output Module. | ||
− | + | The Module was introduced with Version 8.3. Functions were adopted from the obsolete Utility and Output Modules. |
Latest revision as of 18:36, 1 December 2023
This XQuery Module contains functions for string operations and computations.
Contents
Conventions[edit]
All functions and errors in this module and errors are assigned to the http://basex.org/modules/string
namespace, which is statically bound to the string
prefix.
Computations[edit]
string:levenshtein[edit]
Signature | string:levenshtein( $string1 as xs:string, $string2 as xs:string ) as xs:double |
Summary | Computes the Damerau-Levenshtein Distance for two strings and returns a double value (0.0 - 1.0 ). The returned value is computed as follows:
|
Examples |
let $norm := ft:normalize(?, map { 'stemming': true() })
return string:levenshtein($norm("HOUSES"), $norm("house"))
|
string:soundex[edit]
Signature | string:soundex( $string as xs:string ) as xs:string |
Summary | Computes the Soundex value for the specified string. The algorithm can be used to find and index English words with similar pronouncation. |
Examples |
|
string:cologne-phonetic[edit]
Signature | string:cologne-phonetic( $string as xs:string ) as xs:string |
Summary | Computes the Kölner Phonetik value for the specified string. Similar to Soundex, the algorithm is used to find similarly pronounced words, but for the German language. As the first returned digit can be 0 , the value is returned as string.
|
Examples |
|
Formatting[edit]
string:format[edit]
Signature | string:format( $pattern as xs:string, $values... as item() ) as xs:string |
Summary | Returns a formatted string. The remaining $values are incorported into the $pattern , according to Java’s printf syntax.
|
Errors | format : The specified format is not valid.
|
Examples |
|
string:cr[edit]
Signature | string:cr() as xs:string
|
Summary | Returns a single carriage return character ( ).
|
string:nl[edit]
Signature | string:nl() as xs:string
|
Summary | Returns a single newline character ( ).
|
string:tab[edit]
Signature | string:tab() as xs:string
|
Summary | Returns a single tabulator character (	 ).
|
Changelog[edit]
- Version 10.0
- Updated: Renamed from Strings Module to String Module. The namespace URI has been updated as well.
- Updated:
string:format
,string:cr
,string:nl
andstring:tab
adopted from the obsolete Output Module.
The Module was introduced with Version 8.3. Functions were adopted from the obsolete Utility and Output Modules.