Chinese Word Segmentation by Mining Maximized Substrings

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

A major problem in the field of Chinese word segmentation is the identification of out-of-vocabulary words. We propose a simple yet effective approach for extracting maximized substrings, which provide good estimations of unknown word boundaries. We also develop a new semi-supervised segmentation technique that incorporates retrieved substrings using discriminative learning. The effectiveness of this novel approach is demonstrated through experiments using both in-domain and out-of-domain data.

Original languageEnglish
Title of host publication6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference
EditorsRuslan Mitkov, Jong C. Park
PublisherAsian Federation of Natural Language Processing
Pages171-179
Number of pages9
ISBN (Electronic)9784990734800
Publication statusPublished - 2013
Event6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan
Duration: 2013 Oct 14 → …

Publication series

Name6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference

Conference

Conference6th International Joint Conference on Natural Language Processing, IJCNLP 2013
Country/TerritoryJapan
CityNagoya
Period13/10/14 → …

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'Chinese Word Segmentation by Mining Maximized Substrings'. Together they form a unique fingerprint.

Cite this