Predicting listener back-channels for human-agent interaction using neuro-dynamical model

Shotaro Sano, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.

Original languageEnglish
Title of host publication2011 IEEE/SICE International Symposium on System Integration, SII 2011
Pages18-23
Number of pages6
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE/SICE International Symposium on System Integration, SII 2011 - Kyoto
Duration: 2011 Dec 202011 Dec 22

Other

Other2011 IEEE/SICE International Symposium on System Integration, SII 2011
CityKyoto
Period11/12/2011/12/22

Fingerprint

Recurrent neural networks
Dynamical systems
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Control and Systems Engineering

Cite this

Sano, S., Nishide, S., Okuno, H. G., & Ogata, T. (2011). Predicting listener back-channels for human-agent interaction using neuro-dynamical model. In 2011 IEEE/SICE International Symposium on System Integration, SII 2011 (pp. 18-23). [6147412] https://doi.org/10.1109/SII.2011.6147412

Predicting listener back-channels for human-agent interaction using neuro-dynamical model. / Sano, Shotaro; Nishide, Shun; Okuno, Hiroshi G.; Ogata, Tetsuya.

2011 IEEE/SICE International Symposium on System Integration, SII 2011. 2011. p. 18-23 6147412.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sano, S, Nishide, S, Okuno, HG & Ogata, T 2011, Predicting listener back-channels for human-agent interaction using neuro-dynamical model. in 2011 IEEE/SICE International Symposium on System Integration, SII 2011., 6147412, pp. 18-23, 2011 IEEE/SICE International Symposium on System Integration, SII 2011, Kyoto, 11/12/20. https://doi.org/10.1109/SII.2011.6147412
Sano S, Nishide S, Okuno HG, Ogata T. Predicting listener back-channels for human-agent interaction using neuro-dynamical model. In 2011 IEEE/SICE International Symposium on System Integration, SII 2011. 2011. p. 18-23. 6147412 https://doi.org/10.1109/SII.2011.6147412
Sano, Shotaro ; Nishide, Shun ; Okuno, Hiroshi G. ; Ogata, Tetsuya. / Predicting listener back-channels for human-agent interaction using neuro-dynamical model. 2011 IEEE/SICE International Symposium on System Integration, SII 2011. 2011. pp. 18-23
@inproceedings{1ab522de9b9d4ac0a1068567989cd7c9,
title = "Predicting listener back-channels for human-agent interaction using neuro-dynamical model",
abstract = "The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1{\%} of recall, 31.7{\%} of precision, and 34.2{\%} of F-measure. These results show the model to effectively predict and generate back-channel responses.",
author = "Shotaro Sano and Shun Nishide and Okuno, {Hiroshi G.} and Tetsuya Ogata",
year = "2011",
doi = "10.1109/SII.2011.6147412",
language = "English",
isbn = "9781457715235",
pages = "18--23",
booktitle = "2011 IEEE/SICE International Symposium on System Integration, SII 2011",

}

TY - GEN

T1 - Predicting listener back-channels for human-agent interaction using neuro-dynamical model

AU - Sano, Shotaro

AU - Nishide, Shun

AU - Okuno, Hiroshi G.

AU - Ogata, Tetsuya

PY - 2011

Y1 - 2011

N2 - The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.

AB - The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.

UR - http://www.scopus.com/inward/record.url?scp=84857575200&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857575200&partnerID=8YFLogxK

U2 - 10.1109/SII.2011.6147412

DO - 10.1109/SII.2011.6147412

M3 - Conference contribution

SN - 9781457715235

SP - 18

EP - 23

BT - 2011 IEEE/SICE International Symposium on System Integration, SII 2011

ER -