Bayesian nonparametrics for microphone array processing

Takuma Otsuka, Katsuhiko Ishiguro, Hiroshi Sawada, Hiroshi G. Okuno

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

Sound source localization and separation from a mixture of sounds are essential functions for computational auditory scene analysis. The main challenges are designing a unified framework for joint optimization and estimating the sound sources under auditory uncertainties such as reverberation or unknown number of sounds. Since sound source localization and separation are mutually dependent, their simultaneous estimation is required for better and more robust performance. A unified model is presented for sound source localization and separation based on Bayesian nonparametrics. Experiments using simulated and recorded audio mixtures show that a method based on this model achieves state-of-the-art sound source separation quality and has more robust performance on the source number estimation under reverberant environments.

Original languageEnglish
Pages (from-to)493-504
Number of pages12
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume22
Issue number2
DOIs
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Array processing
Microphones
microphones
Acoustic waves
acoustics
scene analysis
Source separation
Reverberation
reverberation
estimating
optimization

Keywords

  • Audio source separation and enhancement (AUDSSEN)
  • Bayesian nonparametrics
  • Blind source separation
  • Microphone array processing
  • Sound source localization
  • Spatial and multichannel audio (AUD-SMCA)
  • Time-frequency masking

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Bayesian nonparametrics for microphone array processing. / Otsuka, Takuma; Ishiguro, Katsuhiko; Sawada, Hiroshi; Okuno, Hiroshi G.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 22, No. 2, 2014, p. 493-504.

Research output: Contribution to journalArticle

Otsuka, Takuma ; Ishiguro, Katsuhiko ; Sawada, Hiroshi ; Okuno, Hiroshi G. / Bayesian nonparametrics for microphone array processing. In: IEEE Transactions on Audio, Speech and Language Processing. 2014 ; Vol. 22, No. 2. pp. 493-504.
@article{f283292648a04011bc37805aab1be6af,
title = "Bayesian nonparametrics for microphone array processing",
abstract = "Sound source localization and separation from a mixture of sounds are essential functions for computational auditory scene analysis. The main challenges are designing a unified framework for joint optimization and estimating the sound sources under auditory uncertainties such as reverberation or unknown number of sounds. Since sound source localization and separation are mutually dependent, their simultaneous estimation is required for better and more robust performance. A unified model is presented for sound source localization and separation based on Bayesian nonparametrics. Experiments using simulated and recorded audio mixtures show that a method based on this model achieves state-of-the-art sound source separation quality and has more robust performance on the source number estimation under reverberant environments.",
keywords = "Audio source separation and enhancement (AUDSSEN), Bayesian nonparametrics, Blind source separation, Microphone array processing, Sound source localization, Spatial and multichannel audio (AUD-SMCA), Time-frequency masking",
author = "Takuma Otsuka and Katsuhiko Ishiguro and Hiroshi Sawada and Okuno, {Hiroshi G.}",
year = "2014",
doi = "10.1109/TASLP.2013.2294582",
language = "English",
volume = "22",
pages = "493--504",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

TY - JOUR

T1 - Bayesian nonparametrics for microphone array processing

AU - Otsuka, Takuma

AU - Ishiguro, Katsuhiko

AU - Sawada, Hiroshi

AU - Okuno, Hiroshi G.

PY - 2014

Y1 - 2014

N2 - Sound source localization and separation from a mixture of sounds are essential functions for computational auditory scene analysis. The main challenges are designing a unified framework for joint optimization and estimating the sound sources under auditory uncertainties such as reverberation or unknown number of sounds. Since sound source localization and separation are mutually dependent, their simultaneous estimation is required for better and more robust performance. A unified model is presented for sound source localization and separation based on Bayesian nonparametrics. Experiments using simulated and recorded audio mixtures show that a method based on this model achieves state-of-the-art sound source separation quality and has more robust performance on the source number estimation under reverberant environments.

AB - Sound source localization and separation from a mixture of sounds are essential functions for computational auditory scene analysis. The main challenges are designing a unified framework for joint optimization and estimating the sound sources under auditory uncertainties such as reverberation or unknown number of sounds. Since sound source localization and separation are mutually dependent, their simultaneous estimation is required for better and more robust performance. A unified model is presented for sound source localization and separation based on Bayesian nonparametrics. Experiments using simulated and recorded audio mixtures show that a method based on this model achieves state-of-the-art sound source separation quality and has more robust performance on the source number estimation under reverberant environments.

KW - Audio source separation and enhancement (AUDSSEN)

KW - Bayesian nonparametrics

KW - Blind source separation

KW - Microphone array processing

KW - Sound source localization

KW - Spatial and multichannel audio (AUD-SMCA)

KW - Time-frequency masking

UR - http://www.scopus.com/inward/record.url?scp=84897935648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897935648&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2013.2294582

DO - 10.1109/TASLP.2013.2294582

M3 - Article

VL - 22

SP - 493

EP - 504

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 2

ER -