Changes

Jump to navigation Jump to search
483 bytes added ,  15:33, 19 November 2011
no edit summary
This article is linked from the [[Full-Text]] page. It gives some insight into the implementation of the full-text features for Japanese text corpora. The Japanese version is [http://files.basex.org/etc/ja-ft.pdf also available as PDF].
==Lexical AnalysisIntroduction==
The lexical analysis of Japanese documents is performed by ''morphological analysis''.[http://igo.sourceforge.jp/ Igo - A morphological analyzer] 0.4.3 Igo is used as ana ''morphological analyser'',external library for morphological analysis. Some and some of the advantages and reasonsfor using Igo are:* it is a prominent morphological analyzer popular project that is compatible with "MeCab" results* it can also be used to distribute the dictionary project MeCab
* the morphological analyzer is implemented in Java and is relatively fast
Japanese tokenization will be activated in BaseX if Igo is found in theclasspath. [http://en.sourceforge.jp/projects/igo/releases/ igo-0.4.3.jar]of Igo is currently included in all distributions of BaseX. In addition to the library, one of the following dictionary files must either be unzipped into the current directory, or into the <code>etc</code> sub-directory of the project’s [[Configuration#Home Directory|Home Directory]]:* IPA Dictionary: http://files.basex.org/etc/ipadic.zip* NAIST Dictionary: http://files.basex.org/etc/naistdic.zip ==Lexical Analysis== The example sentence "私は本を書きました。(I wrote a book.)" is analyzed as follows.
<pre>私は本を書きました。
Bureaucrats, editor, reviewer, Administrators
12,024

edits

Navigation menu