MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals

Hiroshi Sawada, Shoko Araki, Shoji Makino

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

This paper describes the frequency-domain approach to the blind source separation of speech/audio signals that are convolutively mixed in a real room environment. With the application of shorttime Fourier transforms, convolutive mixtures in the time domain can be approximated as multiple instantaneous mixtures in the frequency domain. We employ complex-valued independent component analysis (ICA) to separate the mixtures in each frequency bin. Then, the permutation ambiguity of the ICA solutions should be aligned so that the separated signals are constructed properly in the time domain. We propose a permutation alignment method based on clustering the activity sequences of the frequency bin-wise separated signals. We achieved the overall winner status of MLSP 2007 Data Analysis Competition based on the presented method.

Original languageEnglish
Title of host publicationMachine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP
Pages45-50
Number of pages6
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event17th IEEE International Workshop on Machine Learning for Signal Processing, MLSP-2007 - Thessaloniki, Greece
Duration: 2007 Aug 272007 Aug 29

Publication series

NameMachine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP

Conference

Conference17th IEEE International Workshop on Machine Learning for Signal Processing, MLSP-2007
CountryGreece
CityThessaloniki
Period07/8/2707/8/29

ASJC Scopus subject areas

  • Computer Science(all)
  • Signal Processing

Fingerprint Dive into the research topics of 'MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals'. Together they form a unique fingerprint.

Cite this