Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation

Jin Sakuma*, Shinya Fujie, Tetsunori Kobayashi

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

We propose neural networks for predicting response timing of spoken dialog systems. Response timing varies depending on the dialog context. This context-dependent response timing is conventionally estimated directly from acoustic event sequences and word sequences extracted from past utterances. Since there are so wide varieties in these sequences, large amounts of training data are required to build reliable models. While, there is no large dialog databases with response timings annotated. The proposed method estimates dialog act for each utterance as an auxiliary task, and uses its intermediate states for response timing estimation in addition to acoustic and linguistic features. Since dialog act has significantly less variation than word sequences and is closely related to response timing, we expect to be able to construct a highly reliable model even with small training data. We evaluate our approach on the HARPERVALLEYBANK corpus. The experimental results show that the proposed approach is more effective than the conventional approach that does not use dialog act information for each utterance such as dialog act.

Original languageEnglish
Pages (from-to)4486-4490
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2022-September
DOIs
Publication statusPublished - 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 2022 Sep 182022 Sep 22

Keywords

  • dialog act estimation
  • response timing
  • spoken dialog system

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation'. Together they form a unique fingerprint.

Cite this