Blind separation and dereverberation of speech mixtures by joint optimization

Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi, Hiroshi G. Okuno

Research output: Contribution to journalArticle

77 Citations (Scopus)

Abstract

This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.

Original languageEnglish
Article number5428853
Pages (from-to)69-84
Number of pages16
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume19
Issue number1
DOIs
Publication statusPublished - 2011
Externally publishedYes

Fingerprint

Blind source separation
Reverberation
optimization
reverberation
matrices
Acoustic waves
predictions
rooms

Keywords

  • blind dereverberation (BD)
  • Blind source separation (BSS)
  • conditional separation and dereverberation (CSD)

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Blind separation and dereverberation of speech mixtures by joint optimization. / Yoshioka, Takuya; Nakatani, Tomohiro; Miyoshi, Masato; Okuno, Hiroshi G.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 1, 5428853, 2011, p. 69-84.

Research output: Contribution to journalArticle

@article{6b837291c1614533b674a963d853b8bb,
title = "Blind separation and dereverberation of speech mixtures by joint optimization",
abstract = "This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.",
keywords = "blind dereverberation (BD), Blind source separation (BSS), conditional separation and dereverberation (CSD)",
author = "Takuya Yoshioka and Tomohiro Nakatani and Masato Miyoshi and Okuno, {Hiroshi G.}",
year = "2011",
doi = "10.1109/TASL.2010.2045183",
language = "English",
volume = "19",
pages = "69--84",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - Blind separation and dereverberation of speech mixtures by joint optimization

AU - Yoshioka, Takuya

AU - Nakatani, Tomohiro

AU - Miyoshi, Masato

AU - Okuno, Hiroshi G.

PY - 2011

Y1 - 2011

N2 - This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.

AB - This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.

KW - blind dereverberation (BD)

KW - Blind source separation (BSS)

KW - conditional separation and dereverberation (CSD)

UR - http://www.scopus.com/inward/record.url?scp=77957745677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77957745677&partnerID=8YFLogxK

U2 - 10.1109/TASL.2010.2045183

DO - 10.1109/TASL.2010.2045183

M3 - Article

AN - SCOPUS:77957745677

VL - 19

SP - 69

EP - 84

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 1

M1 - 5428853

ER -