Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array

Yoshiaki Bando, Katsutoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    6 Citations (Scopus)

    Abstract

    This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.

    Original languageEnglish
    Title of host publication2016 24th European Signal Processing Conference, EUSIPCO 2016
    PublisherEuropean Signal Processing Conference, EUSIPCO
    Pages1018-1022
    Number of pages5
    Volume2016-November
    ISBN (Electronic)9780992862657
    DOIs
    Publication statusPublished - 2016 Nov 28
    Event24th European Signal Processing Conference, EUSIPCO 2016 - Budapest, Hungary
    Duration: 2016 Aug 282016 Sep 2

    Other

    Other24th European Signal Processing Conference, EUSIPCO 2016
    CountryHungary
    CityBudapest
    Period16/8/2816/9/2

    Fingerprint

    Microphones
    Hose
    Robots
    Blind source separation
    Signal to noise ratio
    Actuators
    Friction
    Experiments

    ASJC Scopus subject areas

    • Signal Processing
    • Electrical and Electronic Engineering

    Cite this

    Bando, Y., Itoyama, K., Konyo, M., Tadokoro, S., Nakadai, K., Yoshii, K., & Okuno, H. G. (2016). Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array. In 2016 24th European Signal Processing Conference, EUSIPCO 2016 (Vol. 2016-November, pp. 1018-1022). [7760402] European Signal Processing Conference, EUSIPCO. https://doi.org/10.1109/EUSIPCO.2016.7760402

    Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array. / Bando, Yoshiaki; Itoyama, Katsutoshi; Konyo, Masashi; Tadokoro, Satoshi; Nakadai, Kazuhiro; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

    2016 24th European Signal Processing Conference, EUSIPCO 2016. Vol. 2016-November European Signal Processing Conference, EUSIPCO, 2016. p. 1018-1022 7760402.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Bando, Y, Itoyama, K, Konyo, M, Tadokoro, S, Nakadai, K, Yoshii, K & Okuno, HG 2016, Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array. in 2016 24th European Signal Processing Conference, EUSIPCO 2016. vol. 2016-November, 7760402, European Signal Processing Conference, EUSIPCO, pp. 1018-1022, 24th European Signal Processing Conference, EUSIPCO 2016, Budapest, Hungary, 16/8/28. https://doi.org/10.1109/EUSIPCO.2016.7760402
    Bando Y, Itoyama K, Konyo M, Tadokoro S, Nakadai K, Yoshii K et al. Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array. In 2016 24th European Signal Processing Conference, EUSIPCO 2016. Vol. 2016-November. European Signal Processing Conference, EUSIPCO. 2016. p. 1018-1022. 7760402 https://doi.org/10.1109/EUSIPCO.2016.7760402
    Bando, Yoshiaki ; Itoyama, Katsutoshi ; Konyo, Masashi ; Tadokoro, Satoshi ; Nakadai, Kazuhiro ; Yoshii, Kazuyoshi ; Okuno, Hiroshi G. / Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array. 2016 24th European Signal Processing Conference, EUSIPCO 2016. Vol. 2016-November European Signal Processing Conference, EUSIPCO, 2016. pp. 1018-1022
    @inproceedings{6051e45a8be74d249f6deb54aeb16b38,
    title = "Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array",
    abstract = "This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.",
    author = "Yoshiaki Bando and Katsutoshi Itoyama and Masashi Konyo and Satoshi Tadokoro and Kazuhiro Nakadai and Kazuyoshi Yoshii and Okuno, {Hiroshi G.}",
    year = "2016",
    month = "11",
    day = "28",
    doi = "10.1109/EUSIPCO.2016.7760402",
    language = "English",
    volume = "2016-November",
    pages = "1018--1022",
    booktitle = "2016 24th European Signal Processing Conference, EUSIPCO 2016",
    publisher = "European Signal Processing Conference, EUSIPCO",

    }

    TY - GEN

    T1 - Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array

    AU - Bando, Yoshiaki

    AU - Itoyama, Katsutoshi

    AU - Konyo, Masashi

    AU - Tadokoro, Satoshi

    AU - Nakadai, Kazuhiro

    AU - Yoshii, Kazuyoshi

    AU - Okuno, Hiroshi G.

    PY - 2016/11/28

    Y1 - 2016/11/28

    N2 - This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.

    AB - This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.

    UR - http://www.scopus.com/inward/record.url?scp=85006013129&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85006013129&partnerID=8YFLogxK

    U2 - 10.1109/EUSIPCO.2016.7760402

    DO - 10.1109/EUSIPCO.2016.7760402

    M3 - Conference contribution

    AN - SCOPUS:85006013129

    VL - 2016-November

    SP - 1018

    EP - 1022

    BT - 2016 24th European Signal Processing Conference, EUSIPCO 2016

    PB - European Signal Processing Conference, EUSIPCO

    ER -