Speech spotter

On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations

Masataka Goto, Koji Kitayama, Katunobu Itou, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    8 Citations (Scopus)

    Abstract

    This paper describes a novel speech-interface function, called "speech spotter", which enablesauserto enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to another person or a speech recognizer. We solve this problem by using two kinds of nonverbal speech information: a filled pause (a vowel-lengthening hesitation like "er⋯") and voice pitch. Only when a user utters a voice command with a high pitch just after a filled pause is the voice command accepted by the speech recognizer. By using this speech-spotter function, we have built two application systems: an on-demand information system for assisting human-human conversation and a music-playback system for enriching telephone conversation. The results from using these systems have shown that the speech-spotter function is robust and convenient enough to be used in face-to-face or cellular-phone conversations.

    Original languageEnglish
    Title of host publication8th International Conference on Spoken Language Processing, ICSLP 2004
    PublisherInternational Speech Communication Association
    Pages1533-1536
    Number of pages4
    Publication statusPublished - 2004
    Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
    Duration: 2004 Oct 42004 Oct 8

    Other

    Other8th International Conference on Spoken Language Processing, ICSLP 2004
    CountryKorea, Republic of
    CityJeju, Jeju Island
    Period04/10/404/10/8

    Fingerprint

    telephone
    conversation
    demand
    Speech Recognition
    Telephone
    speaking
    information system
    music
    human being

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language

    Cite this

    Goto, M., Kitayama, K., Itou, K., & Kobayashi, T. (2004). Speech spotter: On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. In 8th International Conference on Spoken Language Processing, ICSLP 2004 (pp. 1533-1536). International Speech Communication Association.

    Speech spotter : On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. / Goto, Masataka; Kitayama, Koji; Itou, Katunobu; Kobayashi, Tetsunori.

    8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, 2004. p. 1533-1536.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Goto, M, Kitayama, K, Itou, K & Kobayashi, T 2004, Speech spotter: On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. in 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, pp. 1533-1536, 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of, 04/10/4.
    Goto M, Kitayama K, Itou K, Kobayashi T. Speech spotter: On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. In 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association. 2004. p. 1533-1536
    Goto, Masataka ; Kitayama, Koji ; Itou, Katunobu ; Kobayashi, Tetsunori. / Speech spotter : On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, 2004. pp. 1533-1536
    @inproceedings{70ef31d727444340a43d564b71cc8fe2,
    title = "Speech spotter: On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations",
    abstract = "This paper describes a novel speech-interface function, called {"}speech spotter{"}, which enablesauserto enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to another person or a speech recognizer. We solve this problem by using two kinds of nonverbal speech information: a filled pause (a vowel-lengthening hesitation like {"}er⋯{"}) and voice pitch. Only when a user utters a voice command with a high pitch just after a filled pause is the voice command accepted by the speech recognizer. By using this speech-spotter function, we have built two application systems: an on-demand information system for assisting human-human conversation and a music-playback system for enriching telephone conversation. The results from using these systems have shown that the speech-spotter function is robust and convenient enough to be used in face-to-face or cellular-phone conversations.",
    author = "Masataka Goto and Koji Kitayama and Katunobu Itou and Tetsunori Kobayashi",
    year = "2004",
    language = "English",
    pages = "1533--1536",
    booktitle = "8th International Conference on Spoken Language Processing, ICSLP 2004",
    publisher = "International Speech Communication Association",

    }

    TY - GEN

    T1 - Speech spotter

    T2 - On-demand speech recognition in human-human conversation on the telephone or in face-to-face situations

    AU - Goto, Masataka

    AU - Kitayama, Koji

    AU - Itou, Katunobu

    AU - Kobayashi, Tetsunori

    PY - 2004

    Y1 - 2004

    N2 - This paper describes a novel speech-interface function, called "speech spotter", which enablesauserto enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to another person or a speech recognizer. We solve this problem by using two kinds of nonverbal speech information: a filled pause (a vowel-lengthening hesitation like "er⋯") and voice pitch. Only when a user utters a voice command with a high pitch just after a filled pause is the voice command accepted by the speech recognizer. By using this speech-spotter function, we have built two application systems: an on-demand information system for assisting human-human conversation and a music-playback system for enriching telephone conversation. The results from using these systems have shown that the speech-spotter function is robust and convenient enough to be used in face-to-face or cellular-phone conversations.

    AB - This paper describes a novel speech-interface function, called "speech spotter", which enablesauserto enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to another person or a speech recognizer. We solve this problem by using two kinds of nonverbal speech information: a filled pause (a vowel-lengthening hesitation like "er⋯") and voice pitch. Only when a user utters a voice command with a high pitch just after a filled pause is the voice command accepted by the speech recognizer. By using this speech-spotter function, we have built two application systems: an on-demand information system for assisting human-human conversation and a music-playback system for enriching telephone conversation. The results from using these systems have shown that the speech-spotter function is robust and convenient enough to be used in face-to-face or cellular-phone conversations.

    UR - http://www.scopus.com/inward/record.url?scp=33745215565&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33745215565&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 1533

    EP - 1536

    BT - 8th International Conference on Spoken Language Processing, ICSLP 2004

    PB - International Speech Communication Association

    ER -