Multi-Class Composite N-gram based on connection direction

Hirofumi Yamamoto*, Yoshinori Sagisaka

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

36 Citations (Scopus)

Abstract

A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.

Original languageEnglish
Pages (from-to)533-536
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
Publication statusPublished - 1999 Jan 1
Externally publishedYes
EventProceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99) - Phoenix, AZ, USA
Duration: 1999 Mar 151999 Mar 19

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multi-Class Composite N-gram based on connection direction'. Together they form a unique fingerprint.

Cite this