Autoencoder based multi-stream combination for noise robust speech recognition

Sri Harish Mallidi, Tetsuji Ogawa, Karel Vesely, Phani S. Nidadavolu, Hynek Hermansky

    研究成果: Conference contribution

    14 引用 (Scopus)

    抜粋

    Performances of automatic speech recognition (ASR) systems degrade rapidly when there is a mismatch between train and test acoustic conditions. Performance can be improved using a multi-stream framework, which involves combining posterior probabilities from several classifiers (often deep neural networks (DNNs)) trained on different features/streams. Knowledge about the confidence of each of these classifiers on a noisy test utterance can help in devising better techniques for posterior combination than simple sum and product rules [1]. In this work, we propose to use autoencoders which are multilayer feed forward neural networks, for estimating this confidence measure. During the training phase, for each stream, an autocoder is trained on TANDEM features extracted from the corresponding DNN. On employing the autoencoder during the testing phase, we show that the reconstruction error of the autoencoder is correlated to the robustness of the corresponding stream. These error estimates are then used as confidence measures to combine the posterior probabilities generated from each of the streams. Experiments on Aurora4 and BABEL databases indicate significant improvements, especially in the scenario of mismatch between train and test acoustic conditions.

    元の言語English
    ホスト出版物のタイトルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    出版者International Speech and Communication Association
    ページ3551-3555
    ページ数5
    2015-January
    出版物ステータスPublished - 2015
    イベント16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
    継続期間: 2015 9 62015 9 10

    Other

    Other16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
    Germany
    Dresden
    期間15/9/615/9/10

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation

    フィンガープリント Autoencoder based multi-stream combination for noise robust speech recognition' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Mallidi, S. H., Ogawa, T., Vesely, K., Nidadavolu, P. S., & Hermansky, H. (2015). Autoencoder based multi-stream combination for noise robust speech recognition. : Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (巻 2015-January, pp. 3551-3555). International Speech and Communication Association.