TY - GEN
T1 - Automatic acquisition of basic Katakana lexicon from a given corpus
AU - Nakazawa, Toshiaki
AU - Kawahara, Daisuke
AU - Kurohashi, Sadao
PY - 2005
Y1 - 2005
N2 - Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.
AB - Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.
UR - http://www.scopus.com/inward/record.url?scp=33645990280&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33645990280&partnerID=8YFLogxK
U2 - 10.1007/11562214_60
DO - 10.1007/11562214_60
M3 - Conference contribution
AN - SCOPUS:33645990280
SN - 3540291725
SN - 9783540291725
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 682
EP - 693
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 2nd International Joint Conference on Natural Language Processing, IJCNLP 2005
Y2 - 11 October 2005 through 13 October 2005
ER -