Construction of an idiom corpus and its application to idiom identification based on WSD incorporating idiom-specific features

Chikara Hashimoto*, Daisuke Kawahara

*この研究の対応する著者

研究成果査読

16 被引用数 (Scopus)

抄録

Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the results of an idiom identification experiment using the corpus. The corpus targets 146 ambiguous idioms, and consists of 102,846 sentences, each of which is annotated with a literal/idiom label. For idiom identification, we targeted 90 out of the 146 idioms and adopted a word sense disambiguation (WSD) method using both common WSD features and idiom-specific features. The corpus and the experiment are the largest of their kind, as far as we know. As a result, we found that a standard supervised WSD method works well for the idiom identification and achieved an accuracy of 89.25% and 88.86% with/without idiom-specific features and that the most effective idiom-specific feature is the one involving the adjacency of idiom constituents.

本文言語English
ページ992-1001
ページ数10
DOI
出版ステータスPublished - 2008
外部発表はい
イベント2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation - Honolulu, HI, United States
継続期間: 2008 10 252008 10 27

Conference

Conference2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation
国/地域United States
CityHonolulu, HI
Period08/10/2508/10/27

ASJC Scopus subject areas

  • 情報システム
  • 計算理論と計算数学
  • コンピュータ サイエンスの応用

フィンガープリント

「Construction of an idiom corpus and its application to idiom identification based on WSD incorporating idiom-specific features」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル