A GMM sound source model for blind speech separation in under-determined conditions

Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages446-453
Number of pages8
Volume7191 LNCS
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012 - Tel Aviv
Duration: 2012 Mar 122012 Mar 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7191 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012
CityTel Aviv
Period12/3/1212/3/15

    Fingerprint

Keywords

  • Auxiliary function method
  • Blind speech separation
  • GMM sound source model
  • Under-determined condition

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hirasawa, Y., Yasuraoka, N., Takahashi, T., Ogata, T., & Okuno, H. G. (2012). A GMM sound source model for blind speech separation in under-determined conditions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7191 LNCS, pp. 446-453). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7191 LNCS). https://doi.org/10.1007/978-3-642-28551-6_55