Predicting listener back-channels for human-agent interaction using neuro-dynamical model

Shotaro Sano, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.

Original languageEnglish
Title of host publication2011 IEEE/SICE International Symposium on System Integration, SII 2011
Pages18-23
Number of pages6
DOIs
Publication statusPublished - 2011 Dec 1
Externally publishedYes
Event2011 IEEE/SICE International Symposium on System Integration, SII 2011 - Kyoto, Japan
Duration: 2011 Dec 202011 Dec 22

Publication series

Name2011 IEEE/SICE International Symposium on System Integration, SII 2011

Conference

Conference2011 IEEE/SICE International Symposium on System Integration, SII 2011
CountryJapan
CityKyoto
Period11/12/2011/12/22

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint Dive into the research topics of 'Predicting listener back-channels for human-agent interaction using neuro-dynamical model'. Together they form a unique fingerprint.

Cite this