MULTIMODAL TRANSLATION

Shigeo Morishima, Shin Ogata, Satoshi Nakamura

研究成果: Paper査読

1 被引用数 (Scopus)

抄録

A stand-in is a common technique for movies and TV programs in foreign languages. The current stand-in that only substitutes the voice channel results awkward matching to the mouth motion. Videophone with automatic voice translation are expected to be widely used in the near future, which may face the same problem without lip-synchronized speaking face image translation. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. Also, we propose a method to track motion of the face from the video image. In this system, movement and rotation of the head is detected by template matching using a 3D personal face wire-frame model. By this technique, an automatic multimodal translation can be achieved.

本文言語English
ページ98-103
ページ数6
出版ステータスPublished - 2001
イベント2001 International Conference on Auditory-Visual Speech Processing, AVSP 2001 - Aalborg, Denmark
継続期間: 2001 9月 72001 9月 9

Conference

Conference2001 International Conference on Auditory-Visual Speech Processing, AVSP 2001
国/地域Denmark
CityAalborg
Period01/9/701/9/9

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法
  • 耳鼻咽喉科学

フィンガープリント

「MULTIMODAL TRANSLATION」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル