Separating three simultaneous speeches with two microphones by integrating auditory and visual processing

Hiroshi G. Okuno, Kazuhiro Nakadai, Tino Lourens, Hiroaki Kitano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper addresses the problem of automatic recognition of three simultaneous speeches with two microphones, that is, that of sound source separation where the number of sound sources is greater than that of microphones. The approach used is the direction-pass filter, which is implemented by hypothetical reasoning on the interaural phase difference (IPD) and interaural intensity difference (IID). Auditory processing calculates IPD and IID for each subband, and generates hypotheses for precalculated IPD and IID for every direction including one obtained by visual processing. Then the system calculates the belief factor of hypothesis by Dempster-Shafer theory and determines the direction of each subband. Subbands of the specific direction are collected and then converted to a wave form by inverse FFT. With 200 benchmarks of three simultaneous utterances of Japanese words, the average 1-best and 10-best recognition rates of extracted speeches are 60% and 81%, respectively.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages2643-2646
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
Publication statusPublished - 2001
Externally publishedYes
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 2001 Sep 32001 Sep 7

Other

Other7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period01/9/301/9/7

ASJC Scopus subject areas

  • Communication
  • Linguistics and Language
  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Separating three simultaneous speeches with two microphones by integrating auditory and visual processing'. Together they form a unique fingerprint.

  • Cite this

    Okuno, H. G., Nakadai, K., Lourens, T., & Kitano, H. (2001). Separating three simultaneous speeches with two microphones by integrating auditory and visual processing. In EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology (pp. 2643-2646). International Speech Communication Association.