TY - GEN
T1 - Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction
AU - Sakuma, Jin
AU - Fujie, Shinya
AU - Kobayashi, Tetsunori
N1 - Funding Information:
This research is supported by NII CRIS collaborative research program operated by NII CRIS and LINE Corporation.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Appropriate response timing is very important for achieving smooth dialog progression. Conventionally, prosodic, temporal and linguistic features have been used to determine timing. In addition to the conventional parameters, we propose to utilize the syntactic completeness after a certain time, which represents whether the other party is about to finish speaking. We generate the next token sequence from intermediate speech recognition results using a language model and obtain the probability of the end of utterance appearing K tokens ahead, where K varies from 1 to M. We obtain an M -dimensional vector, which we denote as estimates of syntactic completeness (ESC). We evaluated this method on a simulated dialog database of a restaurant information center. The results confirmed that considering ESC improves the performance of response timing estimation, especially the accuracy in quick responses, compared with the method using only conventional features.
AB - Appropriate response timing is very important for achieving smooth dialog progression. Conventionally, prosodic, temporal and linguistic features have been used to determine timing. In addition to the conventional parameters, we propose to utilize the syntactic completeness after a certain time, which represents whether the other party is about to finish speaking. We generate the next token sequence from intermediate speech recognition results using a language model and obtain the probability of the end of utterance appearing K tokens ahead, where K varies from 1 to M. We obtain an M -dimensional vector, which we denote as estimates of syntactic completeness (ESC). We evaluated this method on a simulated dialog database of a restaurant information center. The results confirmed that considering ESC improves the performance of response timing estimation, especially the accuracy in quick responses, compared with the method using only conventional features.
KW - Response Timing
KW - Spoken Dialog System
KW - Turn-taking
UR - http://www.scopus.com/inward/record.url?scp=85142396229&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142396229&partnerID=8YFLogxK
U2 - 10.1109/SLT54892.2023.10023458
DO - 10.1109/SLT54892.2023.10023458
M3 - Conference contribution
AN - SCOPUS:85142396229
T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
SP - 369
EP - 374
BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022
Y2 - 9 January 2023 through 12 January 2023
ER -