Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction

Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Appropriate response timing is very important for achieving smooth dialog progression. Conventionally, prosodic, temporal and linguistic features have been used to determine timing. In addition to the conventional parameters, we propose to utilize the syntactic completeness after a certain time, which represents whether the other party is about to finish speaking. We generate the next token sequence from intermediate speech recognition results using a language model and obtain the probability of the end of utterance appearing K tokens ahead, where K varies from 1 to M. We obtain an M -dimensional vector, which we denote as estimates of syntactic completeness (ESC). We evaluated this method on a simulated dialog database of a restaurant information center. The results confirmed that considering ESC improves the performance of response timing estimation, especially the accuracy in quick responses, compared with the method using only conventional features.

Original languageEnglish
Title of host publication2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages369-374
Number of pages6
ISBN (Electronic)9798350396904
DOIs
Publication statusPublished - 2023
Event2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, Qatar
Duration: 2023 Jan 92023 Jan 12

Publication series

Name2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

Conference2022 IEEE Spoken Language Technology Workshop, SLT 2022
Country/TerritoryQatar
CityDoha
Period23/1/923/1/12

Keywords

  • Response Timing
  • Spoken Dialog System
  • Turn-taking

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction'. Together they form a unique fingerprint.

Cite this