Audio-visual speech translation with automatic LIP synchronization and face tracking based on 3-D head model

Shigeo Morishima*, Shin Ogata, Kazumasa Murai, Satoshi Nakamura

*この研究の対応する著者

研究成果: Conference article査読

12 被引用数 (Scopus)

抄録

Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. In this paper, we introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation by connected digit discrimination using data with and without audio-visual lip-synchronicity. The results confirm the sufficient quality of the proposed audio-visual translation system.

本文言語English
ページ(範囲)II/2117-II/2120
ジャーナルICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2
出版ステータスPublished - 2002 7月 11
外部発表はい
イベント2002 IEEE International Conference on Acoustic, Speech and Signal Processing - Orlando, FL, United States
継続期間: 2002 5月 132002 5月 17

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Audio-visual speech translation with automatic LIP synchronization and face tracking based on 3-D head model」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル