Fundamental limitation of frequency domain blind source separation for convolutive mixture of speech

Shoko Araki, Shoji Makino, Tsuyoki Nishikawa, Hiroshi Saruwatari

Research output: Contribution to journalArticlepeer-review

60 Citations (Scopus)

Abstract

Despite several recent proposals to achieve Blind Source Separation (BSS) for realistic acoustic signal, separation performance is still not enough. In particular, when the length of impulse response is long, performance is highly limited. In this paper, we show it is useless to be constrained by the condition, P ≪ T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments, a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of TR = 150 and 300 ms. We also clarified the reason for poor performance of BSS in long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammer because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.

Original languageEnglish
Pages (from-to)2737-2740
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume5
DOIs
Publication statusPublished - 2001
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fundamental limitation of frequency domain blind source separation for convolutive mixture of speech'. Together they form a unique fingerprint.

Cite this