Listening to two simultaneous speeches

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.

Original languageEnglish
Pages (from-to)299-310
Number of pages12
JournalSpeech Communication
Volume27
Issue number3
DOIs
Publication statusPublished - 1999 Apr
Externally publishedYes

Fingerprint

Segregation
Speech recognition
segregation
Automatic Speech Recognition
Hidden Markov models
Markov Model
Fragment
Harmonic
Acoustic waves
Speech Enhancement
Speech enhancement
Error Reduction
Speech Recognition
Grouping
Transfer Function
Transfer functions
Degradation
Phonetics
Speech
Model-based

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Cite this

Listening to two simultaneous speeches. / Okuno, Hiroshi G.; Nakatani, Tomohiro; Kawabata, Takeshi.

In: Speech Communication, Vol. 27, No. 3, 04.1999, p. 299-310.

Research output: Contribution to journalArticle

Okuno, Hiroshi G. ; Nakatani, Tomohiro ; Kawabata, Takeshi. / Listening to two simultaneous speeches. In: Speech Communication. 1999 ; Vol. 27, No. 3. pp. 299-310.
@article{43795228f57041acbca654d079d93722,
title = "Listening to two simultaneous speeches",
abstract = "Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64{\%} and 75{\%}, respectively.",
author = "Okuno, {Hiroshi G.} and Tomohiro Nakatani and Takeshi Kawabata",
year = "1999",
month = "4",
doi = "10.1016/S0167-6393(98)00080-6",
language = "English",
volume = "27",
pages = "299--310",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - Listening to two simultaneous speeches

AU - Okuno, Hiroshi G.

AU - Nakatani, Tomohiro

AU - Kawabata, Takeshi

PY - 1999/4

Y1 - 1999/4

N2 - Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.

AB - Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.

UR - http://www.scopus.com/inward/record.url?scp=0032633660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032633660&partnerID=8YFLogxK

U2 - 10.1016/S0167-6393(98)00080-6

DO - 10.1016/S0167-6393(98)00080-6

M3 - Article

VL - 27

SP - 299

EP - 310

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 3

ER -