TY - JOUR
T1 - Mutual information based dynamic integration of multiple feature streams for robust real-time LVCSR
AU - Sato, Shoei
AU - Kobayashi, Akio
AU - Onoe, Kazuo
AU - Homma, Shinichi
AU - Imai, Toru
AU - Takagi, Tohru
AU - Kobayashi, Tetsunori
PY - 2008/3
Y1 - 2008/3
N2 - We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights from the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in realtime from mutual information between an input stream and active HMM states in a search space without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.
AB - We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights from the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in realtime from mutual information between an input stream and active HMM states in a search space without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.
KW - Active hypotheses
KW - Entropy
KW - Mutual information
KW - Speech recognition
KW - Stream integration
UR - http://www.scopus.com/inward/record.url?scp=68249112698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=68249112698&partnerID=8YFLogxK
U2 - 10.1093/ietisy/e91-d.3.815
DO - 10.1093/ietisy/e91-d.3.815
M3 - Article
AN - SCOPUS:68249112698
VL - E91-D
SP - 815
EP - 824
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
SN - 0916-8532
IS - 3
ER -