Ensemble learning for speech enhancement

Jonathan Le Roux, Shinji Watanabe, John R. Hershey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.

Original languageEnglish
Title of host publication2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013 - New Paltz, NY, United States
Duration: 2013 Oct 202013 Oct 23

Other

Other2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
CountryUnited States
CityNew Paltz, NY
Period13/10/2013/10/23

Fingerprint

Speech enhancement
Masks
Learning algorithms
Learning systems
Defects

Keywords

  • Classification
  • Ensemble learning
  • Speech enhancement
  • Stacking
  • Time-frequency mask

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Cite this

Le Roux, J., Watanabe, S., & Hershey, J. R. (2013). Ensemble learning for speech enhancement. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013 [6701888] https://doi.org/10.1109/WASPAA.2013.6701888

Ensemble learning for speech enhancement. / Le Roux, Jonathan; Watanabe, Shinji; Hershey, John R.

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013. 2013. 6701888.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Le Roux, J, Watanabe, S & Hershey, JR 2013, Ensemble learning for speech enhancement. in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013., 6701888, 2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013, New Paltz, NY, United States, 13/10/20. https://doi.org/10.1109/WASPAA.2013.6701888
Le Roux J, Watanabe S, Hershey JR. Ensemble learning for speech enhancement. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013. 2013. 6701888 https://doi.org/10.1109/WASPAA.2013.6701888
Le Roux, Jonathan ; Watanabe, Shinji ; Hershey, John R. / Ensemble learning for speech enhancement. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013. 2013.
@inproceedings{ea61608845de4d9da9cba4dcd4bc7ea8,
title = "Ensemble learning for speech enhancement",
abstract = "Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.",
keywords = "Classification, Ensemble learning, Speech enhancement, Stacking, Time-frequency mask",
author = "{Le Roux}, Jonathan and Shinji Watanabe and Hershey, {John R.}",
year = "2013",
doi = "10.1109/WASPAA.2013.6701888",
language = "English",
isbn = "9781479909728",
booktitle = "2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013",

}

TY - GEN

T1 - Ensemble learning for speech enhancement

AU - Le Roux, Jonathan

AU - Watanabe, Shinji

AU - Hershey, John R.

PY - 2013

Y1 - 2013

N2 - Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.

AB - Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.

KW - Classification

KW - Ensemble learning

KW - Speech enhancement

KW - Stacking

KW - Time-frequency mask

UR - http://www.scopus.com/inward/record.url?scp=84893573842&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893573842&partnerID=8YFLogxK

U2 - 10.1109/WASPAA.2013.6701888

DO - 10.1109/WASPAA.2013.6701888

M3 - Conference contribution

AN - SCOPUS:84893573842

SN - 9781479909728

BT - 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013

ER -