Multi-modal translation system and its evaluation

Shigeo Morishima, S. Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.

Original languageEnglish
Title of host publicationProceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages241-246
Number of pages6
ISBN (Print)0769518346, 9780769518343
DOIs
Publication statusPublished - 2002
Externally publishedYes
Event4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States
Duration: 2002 Oct 142002 Oct 16

Other

Other4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
CountryUnited States
CityPittsburgh
Period02/10/1402/10/16

Fingerprint

Synchronization
Visual communication
Wire
Communication

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Cite this

Morishima, S., & Nakamura, S. (2002). Multi-modal translation system and its evaluation. In Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 (pp. 241-246). [1167000] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICMI.2002.1167000

Multi-modal translation system and its evaluation. / Morishima, Shigeo; Nakamura, S.

Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002. Institute of Electrical and Electronics Engineers Inc., 2002. p. 241-246 1167000.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morishima, S & Nakamura, S 2002, Multi-modal translation system and its evaluation. in Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002., 1167000, Institute of Electrical and Electronics Engineers Inc., pp. 241-246, 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002, Pittsburgh, United States, 02/10/14. https://doi.org/10.1109/ICMI.2002.1167000
Morishima S, Nakamura S. Multi-modal translation system and its evaluation. In Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002. Institute of Electrical and Electronics Engineers Inc. 2002. p. 241-246. 1167000 https://doi.org/10.1109/ICMI.2002.1167000
Morishima, Shigeo ; Nakamura, S. / Multi-modal translation system and its evaluation. Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 241-246
@inproceedings{f8e53f7f424a4c129940b361d6905cc1,
title = "Multi-modal translation system and its evaluation",
abstract = "Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.",
author = "Shigeo Morishima and S. Nakamura",
year = "2002",
doi = "10.1109/ICMI.2002.1167000",
language = "English",
isbn = "0769518346",
pages = "241--246",
booktitle = "Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Multi-modal translation system and its evaluation

AU - Morishima, Shigeo

AU - Nakamura, S.

PY - 2002

Y1 - 2002

N2 - Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.

AB - Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.

UR - http://www.scopus.com/inward/record.url?scp=84963812309&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963812309&partnerID=8YFLogxK

U2 - 10.1109/ICMI.2002.1167000

DO - 10.1109/ICMI.2002.1167000

M3 - Conference contribution

SN - 0769518346

SN - 9780769518343

SP - 241

EP - 246

BT - Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002

PB - Institute of Electrical and Electronics Engineers Inc.

ER -