Multi-Class Composite N-gram based on connection direction

Hirofumi Yamamoto, Yoshinori Sagisaka

研究成果: Conference article

32 被引用数 (Scopus)


A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.

ジャーナルICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
出版ステータスPublished - 1999 1 1
イベントProceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99) - Phoenix, AZ, USA
継続期間: 1999 3 151999 3 19

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント 「Multi-Class Composite N-gram based on connection direction」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。