Changes

Full-Text: Japanese (edit)

Revision as of 17:30, 27 March 2015

125 bytes added , 17:30, 27 March 2015

no edit summary

This article is linked from the [[Full-Text]] page. It gives some insight into the implementation of the full-text features for Japanese text corpora. The Japanese version is [http://files.basex.org/etc/ja-ft.pdf also available as PDF].

Thank you to [http://blog.infinite.jp Toshio HIRAI] for integrating the lexer in BaseX!

==Introduction==

The lexical analysis of Japanese documents is performed by

[http://igo.sourceforge.jp/ Igo]. Igo is a ''morphological analyser'',

and some of the advantages and reasons for using Igo are:

* ~~it is a popular project that is~~ compatible with the results of a prominent morphological analyzer "MeCab" ~~results~~* it can ~~also be used to distribute~~ use the dictionary ~~project~~ distributed by the Project MeCab

* the morphological analyzer is implemented in Java and is relatively fast

* NAIST Dictionary: http://files.basex.org/etc/naistdic.zip

==Lexical Analysis==

The example sentence "私は本を書きました。(I wrote a book.)"

morpheme are used in indexing and stemming.

==Parsing==

During indexing and parsing, the input strings are split into single ''tokens''.

* Auxiliary verb

Thus, in the example above, ~~the "~~{{Code|私"}}, "{{Code|本"}}, "and {{Code|書き" }} will be passed to the indexer

for each token.

==Token Processing==

"Fullwidth" and "Halfwidth" (which is defined by

[http://www.w3.org/TR/xpath-full-text-10/#ftdiacriticsoption Diacritics] Option.

==Stemming==

Stemming in Japanese means to analyze the results of morphological analysis

</pre>

==Wildcards==

The Wildcard option in XQuery Full-Text is available for Japanese as well.

The following example is based on '芥川龍之介(AKUTAGAWA, Ryunosuke)', a prominent Japanese writer,

the first name of whom is often spelled as "竜之介". The following two

queries both return <code>true</code>:

CG

Bureaucrats, editor, reviewer, Administrators

13,550

edits

Changes

Full-Text: Japanese (edit)

Revision as of 17:30, 27 March 2015

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools