Speech starter: Noise-robust endpoint detection by using filled pauses

Koji Kitayama, Masataka Goto, Katunobu Itou, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    9 Citations (Scopus)

    Abstract

    In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., "er.") at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.

    Original languageEnglish
    Title of host publicationEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
    PublisherInternational Speech Communication Association
    Pages1237-1240
    Number of pages4
    Publication statusPublished - 2003
    Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
    Duration: 2003 Sep 12003 Sep 4

    Other

    Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
    CountrySwitzerland
    CityGeneva
    Period03/9/103/9/4

    Fingerprint

    Starters
    Speech recognition
    Acoustic noise
    Fillers
    Communication
    communication

    ASJC Scopus subject areas

    • Computer Science Applications
    • Software
    • Linguistics and Language
    • Communication

    Cite this

    Kitayama, K., Goto, M., Itou, K., & Kobayashi, T. (2003). Speech starter: Noise-robust endpoint detection by using filled pauses. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 1237-1240). International Speech Communication Association.

    Speech starter : Noise-robust endpoint detection by using filled pauses. / Kitayama, Koji; Goto, Masataka; Itou, Katunobu; Kobayashi, Tetsunori.

    EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. p. 1237-1240.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Kitayama, K, Goto, M, Itou, K & Kobayashi, T 2003, Speech starter: Noise-robust endpoint detection by using filled pauses. in EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 1237-1240, 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland, 03/9/1.
    Kitayama K, Goto M, Itou K, Kobayashi T. Speech starter: Noise-robust endpoint detection by using filled pauses. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association. 2003. p. 1237-1240
    Kitayama, Koji ; Goto, Masataka ; Itou, Katunobu ; Kobayashi, Tetsunori. / Speech starter : Noise-robust endpoint detection by using filled pauses. EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. pp. 1237-1240
    @inproceedings{c0419ce4b5ff4a4a9ed54decebed716e,
    title = "Speech starter: Noise-robust endpoint detection by using filled pauses",
    abstract = "In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., {"}er.{"}) at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.",
    author = "Koji Kitayama and Masataka Goto and Katunobu Itou and Tetsunori Kobayashi",
    year = "2003",
    language = "English",
    pages = "1237--1240",
    booktitle = "EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology",
    publisher = "International Speech Communication Association",

    }

    TY - GEN

    T1 - Speech starter

    T2 - Noise-robust endpoint detection by using filled pauses

    AU - Kitayama, Koji

    AU - Goto, Masataka

    AU - Itou, Katunobu

    AU - Kobayashi, Tetsunori

    PY - 2003

    Y1 - 2003

    N2 - In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., "er.") at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.

    AB - In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., "er.") at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.

    UR - http://www.scopus.com/inward/record.url?scp=85009165688&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85009165688&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:85009165688

    SP - 1237

    EP - 1240

    BT - EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology

    PB - International Speech Communication Association

    ER -