Deep unfolding for multichannel source separation

Scott Wisdom, John Hershey, Jonathan Le Roux, Shinji Watanabe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Deep unfolding has recently been proposed to derive novel deep network architectures from model-based approaches. In this paper, we consider its application to multichannel source separation. We unfold a multichannel Gaussian mixture model (MCGMM), resulting in a deep MCGMM computational network that directly processes complex-valued frequency-domain multichannel audio and has an architecture defined explicitly by a generative model, thus combining the advantages of deep networks and model-based approaches. We further extend the deep MCGMM by modeling the GMM states using an MRF, whose unfolded mean-field inference updates add dynamics across layers. Experiments on source separation for multichannel mixtures of two simultaneous speakers shows that the deep MCGMM leads to improved performance with respect to the original MCGMM model.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages121-125
Number of pages5
Volume2016-May
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 2016 May 18
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 2016 Mar 202016 Mar 25

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period16/3/2016/3/25

Fingerprint

Source separation
Network architecture

Keywords

  • Deep unfolding
  • Markov random field
  • multichannel GMM
  • source separation

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Wisdom, S., Hershey, J., Le Roux, J., & Watanabe, S. (2016). Deep unfolding for multichannel source separation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 121-125). [7471649] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7471649

Deep unfolding for multichannel source separation. / Wisdom, Scott; Hershey, John; Le Roux, Jonathan; Watanabe, Shinji.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 121-125 7471649.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wisdom, S, Hershey, J, Le Roux, J & Watanabe, S 2016, Deep unfolding for multichannel source separation. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7471649, Institute of Electrical and Electronics Engineers Inc., pp. 121-125, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 16/3/20. https://doi.org/10.1109/ICASSP.2016.7471649
Wisdom S, Hershey J, Le Roux J, Watanabe S. Deep unfolding for multichannel source separation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 121-125. 7471649 https://doi.org/10.1109/ICASSP.2016.7471649
Wisdom, Scott ; Hershey, John ; Le Roux, Jonathan ; Watanabe, Shinji. / Deep unfolding for multichannel source separation. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 121-125
@inproceedings{e6312479defa4620a718c1b41f351c98,
title = "Deep unfolding for multichannel source separation",
abstract = "Deep unfolding has recently been proposed to derive novel deep network architectures from model-based approaches. In this paper, we consider its application to multichannel source separation. We unfold a multichannel Gaussian mixture model (MCGMM), resulting in a deep MCGMM computational network that directly processes complex-valued frequency-domain multichannel audio and has an architecture defined explicitly by a generative model, thus combining the advantages of deep networks and model-based approaches. We further extend the deep MCGMM by modeling the GMM states using an MRF, whose unfolded mean-field inference updates add dynamics across layers. Experiments on source separation for multichannel mixtures of two simultaneous speakers shows that the deep MCGMM leads to improved performance with respect to the original MCGMM model.",
keywords = "Deep unfolding, Markov random field, multichannel GMM, source separation",
author = "Scott Wisdom and John Hershey and {Le Roux}, Jonathan and Shinji Watanabe",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7471649",
language = "English",
volume = "2016-May",
pages = "121--125",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Deep unfolding for multichannel source separation

AU - Wisdom, Scott

AU - Hershey, John

AU - Le Roux, Jonathan

AU - Watanabe, Shinji

PY - 2016/5/18

Y1 - 2016/5/18

N2 - Deep unfolding has recently been proposed to derive novel deep network architectures from model-based approaches. In this paper, we consider its application to multichannel source separation. We unfold a multichannel Gaussian mixture model (MCGMM), resulting in a deep MCGMM computational network that directly processes complex-valued frequency-domain multichannel audio and has an architecture defined explicitly by a generative model, thus combining the advantages of deep networks and model-based approaches. We further extend the deep MCGMM by modeling the GMM states using an MRF, whose unfolded mean-field inference updates add dynamics across layers. Experiments on source separation for multichannel mixtures of two simultaneous speakers shows that the deep MCGMM leads to improved performance with respect to the original MCGMM model.

AB - Deep unfolding has recently been proposed to derive novel deep network architectures from model-based approaches. In this paper, we consider its application to multichannel source separation. We unfold a multichannel Gaussian mixture model (MCGMM), resulting in a deep MCGMM computational network that directly processes complex-valued frequency-domain multichannel audio and has an architecture defined explicitly by a generative model, thus combining the advantages of deep networks and model-based approaches. We further extend the deep MCGMM by modeling the GMM states using an MRF, whose unfolded mean-field inference updates add dynamics across layers. Experiments on source separation for multichannel mixtures of two simultaneous speakers shows that the deep MCGMM leads to improved performance with respect to the original MCGMM model.

KW - Deep unfolding

KW - Markov random field

KW - multichannel GMM

KW - source separation

UR - http://www.scopus.com/inward/record.url?scp=84973299433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973299433&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7471649

DO - 10.1109/ICASSP.2016.7471649

M3 - Conference contribution

AN - SCOPUS:84973299433

VL - 2016-May

SP - 121

EP - 125

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -