New speech enhancement

Speech stream segregation

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to special distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.

Original languageEnglish
Title of host publicationInternational Conference on Spoken Language Processing, ICSLP, Proceedings
Editors Anon
Place of PublicationPiscataway, NJ, United States
PublisherIEEE
Pages2356-2359
Number of pages4
Volume4
Publication statusPublished - 1996
Externally publishedYes
EventProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA
Duration: 1996 Oct 31996 Oct 6

Other

OtherProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4)
CityPhiladelphia, PA, USA
Period96/10/396/10/6

Fingerprint

Speech enhancement
Speech recognition
Acoustic waves
Transfer functions
Degradation
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Okuno, H. G., Nakatani, T., & Kawabata, T. (1996). New speech enhancement: Speech stream segregation. In Anon (Ed.), International Conference on Spoken Language Processing, ICSLP, Proceedings (Vol. 4, pp. 2356-2359). Piscataway, NJ, United States: IEEE.

New speech enhancement : Speech stream segregation. / Okuno, Hiroshi G.; Nakatani, Tomohiro; Kawabata, Takeshi.

International Conference on Spoken Language Processing, ICSLP, Proceedings. ed. / Anon. Vol. 4 Piscataway, NJ, United States : IEEE, 1996. p. 2356-2359.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Okuno, HG, Nakatani, T & Kawabata, T 1996, New speech enhancement: Speech stream segregation. in Anon (ed.), International Conference on Spoken Language Processing, ICSLP, Proceedings. vol. 4, IEEE, Piscataway, NJ, United States, pp. 2356-2359, Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4), Philadelphia, PA, USA, 96/10/3.
Okuno HG, Nakatani T, Kawabata T. New speech enhancement: Speech stream segregation. In Anon, editor, International Conference on Spoken Language Processing, ICSLP, Proceedings. Vol. 4. Piscataway, NJ, United States: IEEE. 1996. p. 2356-2359
Okuno, Hiroshi G. ; Nakatani, Tomohiro ; Kawabata, Takeshi. / New speech enhancement : Speech stream segregation. International Conference on Spoken Language Processing, ICSLP, Proceedings. editor / Anon. Vol. 4 Piscataway, NJ, United States : IEEE, 1996. pp. 2356-2359
@inproceedings{a8c696420dd943fc8ac94f14f33723f2,
title = "New speech enhancement: Speech stream segregation",
abstract = "Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to special distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75{\%}.",
author = "Okuno, {Hiroshi G.} and Tomohiro Nakatani and Takeshi Kawabata",
year = "1996",
language = "English",
volume = "4",
pages = "2356--2359",
editor = "Anon",
booktitle = "International Conference on Spoken Language Processing, ICSLP, Proceedings",
publisher = "IEEE",

}

TY - GEN

T1 - New speech enhancement

T2 - Speech stream segregation

AU - Okuno, Hiroshi G.

AU - Nakatani, Tomohiro

AU - Kawabata, Takeshi

PY - 1996

Y1 - 1996

N2 - Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to special distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.

AB - Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to special distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.

UR - http://www.scopus.com/inward/record.url?scp=0030351621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030351621&partnerID=8YFLogxK

M3 - Conference contribution

VL - 4

SP - 2356

EP - 2359

BT - International Conference on Spoken Language Processing, ICSLP, Proceedings

A2 - Anon, null

PB - IEEE

CY - Piscataway, NJ, United States

ER -