Multi-class composite N-gram language model using multipleword clusters and word successions

Shuntaro Isogai, Katsuhiko Shirai, Hirofumi Yamamoto, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem in small amount of training data. The Multi-Class Composite Ngram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, so-called Multi-Classes. In the Multi-Class, the statistical connectivity at each position of the N-grams is regarded as word attributes, and one word cluster each is created to represent positional attributes. Furthermore, by introducing higher order word N-grams through the grouping of frequent word successions, Multi-Class N-grams are extended to Multi-Class Composite N-grams. In experiments, the Multi- Class Composite N-grams result in 9.5% lower perplexity and a 16% lower word error rate in speech recognition with a 40% smaller parameter size than conventional word 3-grams.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages25-28
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
Publication statusPublished - 2001
Externally publishedYes
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 2001 Sep 32001 Sep 7

Other

Other7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period01/9/301/9/7

ASJC Scopus subject areas

  • Communication
  • Linguistics and Language
  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Multi-class composite N-gram language model using multipleword clusters and word successions'. Together they form a unique fingerprint.

  • Cite this

    Isogai, S., Shirai, K., Yamamoto, H., & Sagisaka, Y. (2001). Multi-class composite N-gram language model using multipleword clusters and word successions. In EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology (pp. 25-28). International Speech Communication Association.