Dynamic integration of multiple feature streams for robust real-time LVCSR

Shoei Sato, Kazuo Onoe, Kio Kobayashi, Shinich Homma, Torn Imai, Tohru Takagi, Tetsunori Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a novel method of integrating the likelihoods of multiple feature streams for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a heavier weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to bring out discriminative ability. The weight is calculated in real time from mutual information between an input stream and active HMM states in a search space. In this paper, we describe three features that are extracted through auditory filters by taking into account the human auditory system extracting amplitude and frequency modulations. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments using field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9% relative to the best result obtained from a single stream.

Original languageEnglish
Title of host publicationInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Pages345-348
Number of pages4
Publication statusPublished - 2007 Dec 1
Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp, Belgium
Duration: 2007 Aug 272007 Aug 31

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume1
ISSN (Electronic)1990-9772

Conference

Conference8th Annual Conference of the International Speech Communication Association, Interspeech 2007
CountryBelgium
CityAntwerp
Period07/8/2707/8/31

    Fingerprint

Keywords

  • Active hypotheses
  • Entropy
  • Mutual information
  • Speech recognition
  • Stream integration

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Modelling and Simulation
  • Linguistics and Language
  • Communication

Cite this

Sato, S., Onoe, K., Kobayashi, K., Homma, S., Imai, T., Takagi, T., & Kobayashi, T. (2007). Dynamic integration of multiple feature streams for robust real-time LVCSR. In International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007 (pp. 345-348). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 1).