Automatic acquisition of basic Katakana lexicon from a given corpus

Toshiaki Nakazawa*, Daisuke Kawahara, Sadao Kurohashi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages682-693
Number of pages12
DOIs
Publication statusPublished - 2005
Externally publishedYes
Event2nd International Joint Conference on Natural Language Processing, IJCNLP 2005 - Jeju Island, Korea, Republic of
Duration: 2005 Oct 112005 Oct 13

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3651 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Joint Conference on Natural Language Processing, IJCNLP 2005
Country/TerritoryKorea, Republic of
CityJeju Island
Period05/10/1105/10/13

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Automatic acquisition of basic Katakana lexicon from a given corpus'. Together they form a unique fingerprint.

Cite this