</pre>
By default, unless the languages codes "<code>ja"</code>, "<code>ar"</code>, "<code>ko"</code>, "<code>th"</code>, or "<code>zh" </code> are specified, a tokenizer for Western texts will be used to tokenize texts:
* Whitespaces are interpreted as token delimiters.