Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems

Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We exploit the barge-in rate of individual users to predict automatic speech recognition (ASR) errors. A barge-in is a situation in which a user starts speaking during a system prompt, and it can be detected even when ASR results are not reliable. Such features not using ASR results can be a clue for managing a situation in which user utterances cannot be successfully recognized. Since individual users in our system can be identified by their phone numbers, we accumulate how often each user barges in and use this rate as a user profile for determining whether a current "barge-in" utterance should be accepted or not. We furthermore set a window that reflects the temporal transition of the user's behavior as they get accustomed to the system. Experimental results show that setting the window improves the prediction accuracy of whether the utterance should be accepted or not. The experiments also clarify the minimum window width for improving accuracy.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages183-186
Number of pages4
Publication statusPublished - 2008
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 2008 Sep 222008 Sep 26

Other

OtherINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
CountryAustralia
CityBrisbane, QLD
Period08/9/2208/9/26

Fingerprint

Barges
Speech recognition
Experiments

Keywords

  • Barge-in
  • Spoken dialogue system
  • User modeling

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Cite this

Komatani, K., Kawahara, T., & Okuno, H. G. (2008). Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 183-186)

Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. / Komatani, Kazunori; Kawahara, Tatsuya; Okuno, Hiroshi G.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 183-186.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Komatani, K, Kawahara, T & Okuno, HG 2008, Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 183-186, INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Brisbane, QLD, Australia, 08/9/22.
Komatani K, Kawahara T, Okuno HG. Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 183-186
Komatani, Kazunori ; Kawahara, Tatsuya ; Okuno, Hiroshi G. / Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. pp. 183-186
@inproceedings{b5fe9281e89d448ea3b6498a78939f65,
title = "Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems",
abstract = "We exploit the barge-in rate of individual users to predict automatic speech recognition (ASR) errors. A barge-in is a situation in which a user starts speaking during a system prompt, and it can be detected even when ASR results are not reliable. Such features not using ASR results can be a clue for managing a situation in which user utterances cannot be successfully recognized. Since individual users in our system can be identified by their phone numbers, we accumulate how often each user barges in and use this rate as a user profile for determining whether a current {"}barge-in{"} utterance should be accepted or not. We furthermore set a window that reflects the temporal transition of the user's behavior as they get accustomed to the system. Experimental results show that setting the window improves the prediction accuracy of whether the utterance should be accepted or not. The experiments also clarify the minimum window width for improving accuracy.",
keywords = "Barge-in, Spoken dialogue system, User modeling",
author = "Kazunori Komatani and Tatsuya Kawahara and Okuno, {Hiroshi G.}",
year = "2008",
language = "English",
pages = "183--186",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

}

TY - GEN

T1 - Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems

AU - Komatani, Kazunori

AU - Kawahara, Tatsuya

AU - Okuno, Hiroshi G.

PY - 2008

Y1 - 2008

N2 - We exploit the barge-in rate of individual users to predict automatic speech recognition (ASR) errors. A barge-in is a situation in which a user starts speaking during a system prompt, and it can be detected even when ASR results are not reliable. Such features not using ASR results can be a clue for managing a situation in which user utterances cannot be successfully recognized. Since individual users in our system can be identified by their phone numbers, we accumulate how often each user barges in and use this rate as a user profile for determining whether a current "barge-in" utterance should be accepted or not. We furthermore set a window that reflects the temporal transition of the user's behavior as they get accustomed to the system. Experimental results show that setting the window improves the prediction accuracy of whether the utterance should be accepted or not. The experiments also clarify the minimum window width for improving accuracy.

AB - We exploit the barge-in rate of individual users to predict automatic speech recognition (ASR) errors. A barge-in is a situation in which a user starts speaking during a system prompt, and it can be detected even when ASR results are not reliable. Such features not using ASR results can be a clue for managing a situation in which user utterances cannot be successfully recognized. Since individual users in our system can be identified by their phone numbers, we accumulate how often each user barges in and use this rate as a user profile for determining whether a current "barge-in" utterance should be accepted or not. We furthermore set a window that reflects the temporal transition of the user's behavior as they get accustomed to the system. Experimental results show that setting the window improves the prediction accuracy of whether the utterance should be accepted or not. The experiments also clarify the minimum window width for improving accuracy.

KW - Barge-in

KW - Spoken dialogue system

KW - User modeling

UR - http://www.scopus.com/inward/record.url?scp=84867203995&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867203995&partnerID=8YFLogxK

M3 - Conference contribution

SP - 183

EP - 186

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

ER -