Sound source localization using deep learning models

Nelson Yalta, Kazuhiro Nakadai, Tetsuya Ogata

    研究成果: Article

    14 引用 (Scopus)

    抄録

    This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

    元の言語English
    ページ(範囲)37-48
    ページ数12
    ジャーナルJournal of Robotics and Mechatronics
    29
    発行部数1
    DOI
    出版物ステータスPublished - 2017 2 1

    Fingerprint

    Image classification
    Acoustic waves
    Microphones
    Speech recognition
    Time delay
    Deep neural networks
    Deep learning

    ASJC Scopus subject areas

    • Computer Science(all)
    • Electrical and Electronic Engineering

    これを引用

    Sound source localization using deep learning models. / Yalta, Nelson; Nakadai, Kazuhiro; Ogata, Tetsuya.

    :: Journal of Robotics and Mechatronics, 巻 29, 番号 1, 01.02.2017, p. 37-48.

    研究成果: Article

    Yalta, Nelson ; Nakadai, Kazuhiro ; Ogata, Tetsuya. / Sound source localization using deep learning models. :: Journal of Robotics and Mechatronics. 2017 ; 巻 29, 番号 1. pp. 37-48.
    @article{b4d7203d82e2484db49f651bb5e09708,
    title = "Sound source localization using deep learning models",
    abstract = "This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.",
    keywords = "Deep learning, Deep residual networks, Sound source localization",
    author = "Nelson Yalta and Kazuhiro Nakadai and Tetsuya Ogata",
    year = "2017",
    month = "2",
    day = "1",
    doi = "10.20965/jrm.2017.p0037",
    language = "English",
    volume = "29",
    pages = "37--48",
    journal = "Journal of Robotics and Mechatronics",
    issn = "0915-3942",
    publisher = "Fuji Technology Press",
    number = "1",

    }

    TY - JOUR

    T1 - Sound source localization using deep learning models

    AU - Yalta, Nelson

    AU - Nakadai, Kazuhiro

    AU - Ogata, Tetsuya

    PY - 2017/2/1

    Y1 - 2017/2/1

    N2 - This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

    AB - This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

    KW - Deep learning

    KW - Deep residual networks

    KW - Sound source localization

    UR - http://www.scopus.com/inward/record.url?scp=85013969406&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85013969406&partnerID=8YFLogxK

    U2 - 10.20965/jrm.2017.p0037

    DO - 10.20965/jrm.2017.p0037

    M3 - Article

    AN - SCOPUS:85013969406

    VL - 29

    SP - 37

    EP - 48

    JO - Journal of Robotics and Mechatronics

    JF - Journal of Robotics and Mechatronics

    SN - 0915-3942

    IS - 1

    ER -