A GMM sound source model for blind speech separation in under-determined conditions

Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages446-453
Number of pages8
Volume7191 LNCS
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012 - Tel Aviv
Duration: 2012 Mar 122012 Mar 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7191 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012
CityTel Aviv
Period12/3/1212/3/15

Fingerprint

Gaussian Mixture Model
Acoustic waves
Interference
Non-negative Matrix Factorization
Auxiliary Function
Speech Signal
Model
Updating
Frequency Domain
Time Domain
Microphones
Factorization
Harmonic
Model-based
Speech
Sound
Experimental Results

Keywords

  • Auxiliary function method
  • Blind speech separation
  • GMM sound source model
  • Under-determined condition

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hirasawa, Y., Yasuraoka, N., Takahashi, T., Ogata, T., & Okuno, H. G. (2012). A GMM sound source model for blind speech separation in under-determined conditions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7191 LNCS, pp. 446-453). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7191 LNCS). https://doi.org/10.1007/978-3-642-28551-6_55

A GMM sound source model for blind speech separation in under-determined conditions. / Hirasawa, Yasuharu; Yasuraoka, Naoki; Takahashi, Toru; Ogata, Tetsuya; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7191 LNCS 2012. p. 446-453 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7191 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hirasawa, Y, Yasuraoka, N, Takahashi, T, Ogata, T & Okuno, HG 2012, A GMM sound source model for blind speech separation in under-determined conditions. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7191 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7191 LNCS, pp. 446-453, 10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012, Tel Aviv, 12/3/12. https://doi.org/10.1007/978-3-642-28551-6_55
Hirasawa Y, Yasuraoka N, Takahashi T, Ogata T, Okuno HG. A GMM sound source model for blind speech separation in under-determined conditions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7191 LNCS. 2012. p. 446-453. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-28551-6_55
Hirasawa, Yasuharu ; Yasuraoka, Naoki ; Takahashi, Toru ; Ogata, Tetsuya ; Okuno, Hiroshi G. / A GMM sound source model for blind speech separation in under-determined conditions. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7191 LNCS 2012. pp. 446-453 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{7310fe4aba22462c86b794c5a752aefc,
title = "A GMM sound source model for blind speech separation in under-determined conditions",
abstract = "This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.",
keywords = "Auxiliary function method, Blind speech separation, GMM sound source model, Under-determined condition",
author = "Yasuharu Hirasawa and Naoki Yasuraoka and Toru Takahashi and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2012",
doi = "10.1007/978-3-642-28551-6_55",
language = "English",
isbn = "9783642285509",
volume = "7191 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "446--453",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A GMM sound source model for blind speech separation in under-determined conditions

AU - Hirasawa, Yasuharu

AU - Yasuraoka, Naoki

AU - Takahashi, Toru

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2012

Y1 - 2012

N2 - This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.

AB - This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.

KW - Auxiliary function method

KW - Blind speech separation

KW - GMM sound source model

KW - Under-determined condition

UR - http://www.scopus.com/inward/record.url?scp=84863115686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863115686&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-28551-6_55

DO - 10.1007/978-3-642-28551-6_55

M3 - Conference contribution

AN - SCOPUS:84863115686

SN - 9783642285509

VL - 7191 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 446

EP - 453

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -