Sound source separation for robot audition using deep learning

Kuniaki Noda, Naoya Hashimoto, Kazuhiro Nakadai, Tetsuya Ogata

    研究成果: Conference contribution

    3 引用 (Scopus)

    抜粋

    Noise robust speech recognition is crucial for effective human-machine interaction in real-world environments. Sound source separation (SSS) is one of the most widely used approaches for addressing noise robust speech recognition by extracting a target speaker's speech signal while suppressing simultaneous unintended signals. However, conventional SSS algorithms, such as independent component analysis or nonlinear principal component analysis, are limited in modeling complex projections with scalability. Moreover, conventional systems required designing an independent subsystem for noise reduction (NR) in addition to the SSS. To overcome these issues, we propose a deep neural network (DNN) framework for modeling the separation function (SF) of an SSS system. By training a DNN to predict clean sound features of a target sound from corresponding multichannel deteriorated sound feature inputs, we enable the DNN to model the SF for extracting the target sound without prior knowledge regarding the acoustic properties of the surrounding environment. Moreover, the same DNN is trained to function simultaneously as a NR filter. Our proposed SSS system is evaluated using an isolated word recognition task and a large vocabulary continuous speech recognition task when either nondirectional or directional noise is accumulated in the target speech. Our evaluation results demonstrate that DNN performs noticeably better than the baseline approach, especially when directional noise is accumulated with a low signal-to-noise ratio.

    元の言語English
    ホスト出版物のタイトルIEEE-RAS International Conference on Humanoid Robots
    出版者IEEE Computer Society
    ページ389-394
    ページ数6
    2015-December
    ISBN(印刷物)9781479968855
    DOI
    出版物ステータスPublished - 2015 12 22
    イベント15th IEEE RAS International Conference on Humanoid Robots, Humanoids 2015 - Seoul, Korea, Republic of
    継続期間: 2015 11 32015 11 5

    Other

    Other15th IEEE RAS International Conference on Humanoid Robots, Humanoids 2015
    Korea, Republic of
    Seoul
    期間15/11/315/11/5

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Vision and Pattern Recognition
    • Hardware and Architecture
    • Human-Computer Interaction
    • Electrical and Electronic Engineering

    フィンガープリント Sound source separation for robot audition using deep learning' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Noda, K., Hashimoto, N., Nakadai, K., & Ogata, T. (2015). Sound source separation for robot audition using deep learning. : IEEE-RAS International Conference on Humanoid Robots (巻 2015-December, pp. 389-394). [7363579] IEEE Computer Society. https://doi.org/10.1109/HUMANOIDS.2015.7363579