Speech starter: Noise-robust endpoint detection by using filled pauses

Koji Kitayama, Masataka Goto, Katunobu Itou, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    9 Citations (Scopus)

    Abstract

    In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., "er.") at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.

    Original languageEnglish
    Title of host publicationEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
    PublisherInternational Speech Communication Association
    Pages1237-1240
    Number of pages4
    Publication statusPublished - 2003
    Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
    Duration: 2003 Sep 12003 Sep 4

    Other

    Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
    CountrySwitzerland
    CityGeneva
    Period03/9/103/9/4

    ASJC Scopus subject areas

    • Computer Science Applications
    • Software
    • Linguistics and Language
    • Communication

    Fingerprint Dive into the research topics of 'Speech starter: Noise-robust endpoint detection by using filled pauses'. Together they form a unique fingerprint.

  • Cite this

    Kitayama, K., Goto, M., Itou, K., & Kobayashi, T. (2003). Speech starter: Noise-robust endpoint detection by using filled pauses. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 1237-1240). International Speech Communication Association.