Automatic acquisition of basic Katakana lexicon from a given corpus

Toshiaki Nakazawa*, Daisuke Kawahara, Sadao Kurohashi

*この研究の対応する著者

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

本文言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ682-693
ページ数12
DOI
出版ステータスPublished - 2005
外部発表はい
イベント2nd International Joint Conference on Natural Language Processing, IJCNLP 2005 - Jeju Island, Korea, Republic of
継続期間: 2005 10 112005 10 13

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
3651 LNAI
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference2nd International Joint Conference on Natural Language Processing, IJCNLP 2005
国/地域Korea, Republic of
CityJeju Island
Period05/10/1105/10/13

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Automatic acquisition of basic Katakana lexicon from a given corpus」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル