Listening to two simultaneous speeches

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.

Original languageEnglish
Pages (from-to)299-310
Number of pages12
JournalSpeech Communication
Volume27
Issue number3
DOIs
Publication statusPublished - 1999 Apr
Externally publishedYes

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Listening to two simultaneous speeches'. Together they form a unique fingerprint.

  • Cite this

    Okuno, H. G., Nakatani, T., & Kawabata, T. (1999). Listening to two simultaneous speeches. Speech Communication, 27(3), 299-310. https://doi.org/10.1016/S0167-6393(98)00080-6