Out-of-vocabulary word recognition using a hierarchical language model based on multiple Markov models

Hirofumi Yamamoto, Hiroaki Kokubo, Genichiro Kikui, Yoshihiko Ogawa, Yoshinori Sagisaka

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper we propose a language model to solve the issue of task-dependent out-of-vocabulary words in speech recognition. Language model adaptation is a standard method to enable the application of a language model to a new task; however, this approach is not able to deal with the issue of out-of-vocabulary proper names that appear in a task-dependent fashion. In this paper we attempt to solve this issue using a hierarchical language model. In the hierarchical model we use two independent Markov models to constrain the transition probabilities and phonetic sequence emission probabilities of out-of-vocabulary words. In this way we express the emission probabilities of out-of-vocabulary words in the form of a double Markov model that combines both sets of probabilities. We have conducted speech recognition experiments using Japanese dialogue data in the appointments domain. The results show that for sentences containing one or more out-of-vocabulary words, this approach gives a word accuracy rate of 86.7% compared to word accuracy rate of 78.2% when no strategy for out-of-vocabulary words is employed. This corresponds to an elimination of 34.4% of the baseline errors and confirms the effectiveness of the approach.

Original languageEnglish
Pages (from-to)55-64
Number of pages10
JournalElectronics and Communications in Japan, Part II: Electronics (English translation of Denshi Tsushin Gakkai Ronbunshi)
Volume88
Issue number12
DOIs
Publication statusPublished - 2005 Dec 1

Keywords

  • Hierarchical language model
  • Markov model
  • N-gram
  • Out-of-vocabulary words

ASJC Scopus subject areas

  • Physics and Astronomy(all)
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Out-of-vocabulary word recognition using a hierarchical language model based on multiple Markov models'. Together they form a unique fingerprint.

  • Cite this