Timing generating networks: Neural network based precise turn-taking timing prediction in multiparty conversation

Shinya Fujie*, Hayato Katayama, Jin Sakuma, Tetsunori Kobayashi

*この研究の対応する著者

研究成果: Conference contribution

抄録

A brand new neural network based precise timing generation framework, named the Timing Generating Network (TGN), is proposed and applied to turn-taking timing decision problems. Although turn-taking problems have conventionally been formalized as users' end-of-turn detection, this approach cannot estimate the precise timing at which a spoken dialogue system should take a turn to start its utterance. Since several conventional approaches estimate precise timings but the estimation executed only at/after the end of preceding user's utterance, they highly depend on the accuracy of intermediate decision modules, such as voice activity detection, etc. The advantages of the TGN are that its parameters are tunable via error backpropagation as it is described in a differentiable form as a whole, and it is free from inter-module error propagation as it has no deterministic intermediate modules. The experimental results show that the proposed system is superior to a conventional turn-taking system that adopts the hard decisions on user's voice activity detection and response time estimation.

本文言語English
ホスト出版物のタイトル22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
出版社International Speech Communication Association
ページ3771-3775
ページ数5
ISBN(電子版)9781713836902
DOI
出版ステータスPublished - 2021
イベント22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
継続期間: 2021 8月 302021 9月 3

出版物シリーズ

名前Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
5
ISSN(印刷版)2308-457X
ISSN(電子版)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
国/地域Czech Republic
CityBrno
Period21/8/3021/9/3

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Timing generating networks: Neural network based precise turn-taking timing prediction in multiparty conversation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル