Bilinear map of filter-bank outputs for DNN-based speech recognition

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

    研究成果: Conference contribution

    1 引用 (Scopus)

    抜粋

    Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.

    元の言語English
    ホスト出版物のタイトルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    出版者International Speech and Communication Association
    ページ16-20
    ページ数5
    2015-January
    出版物ステータスPublished - 2015
    イベント16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
    継続期間: 2015 9 62015 9 10

    Other

    Other16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
    Germany
    Dresden
    期間15/9/615/9/10

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation

    フィンガープリント Bilinear map of filter-bank outputs for DNN-based speech recognition' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Ogawa, T., Ueda, K., Katsurada, K., Kobayashi, T., & Nitta, T. (2015). Bilinear map of filter-bank outputs for DNN-based speech recognition. : Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (巻 2015-January, pp. 16-20). International Speech and Communication Association.