Blind speech separation in a meeting situation with maximum SNR beamformers

Shoko Araki, Hiroshi Sawada, Shoji Makino

Research output: Chapter in Book/Report/Conference proceedingConference contribution

61 Citations (Scopus)

Abstract

We propose a speech separation method for a meeting situation, where each speaker sometimes speaks and the number of speakers changes every moment. Many source separation methods have already been proposed, however, they consider a case where all the speakers keep speaking: this is not always true in a real meeting. In such, cases, in addition, to separation, speech detection and the classification of the detected speech according to speaker become important issues. For that purpose, we propose a method that employs a maximum signal-to-noise (MaxSNR) beamformer combined with a voice activity detector and online clustering. We also discuss the scaling ambiguity problem as regards the MaxSNR beamformer, and provide their solutions. We report some encouraging results for a real meetingin a room with a reverberation time of about 350 ms.

Original languageEnglish
Title of host publication2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
PagesI41-I44
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07 - Honolulu, HI, United States
Duration: 2007 Apr 152007 Apr 20

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
ISSN (Print)1520-6149

Conference

Conference2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
CountryUnited States
CityHonolulu, HI
Period07/4/1507/4/20

Keywords

  • Ambiguity
  • Maximum SNR beamformer
  • Online clustering
  • Scaling
  • Speech separation
  • Voice activity detector

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Blind speech separation in a meeting situation with maximum SNR beamformers'. Together they form a unique fingerprint.

Cite this