Model-based talking face synthesis for anthropomorphic spoken dialog agent system

Tatsuo Yotsukura, Shigeo Morishima, Satoshi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols. In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part.

Original languageEnglish
Title of host publicationProceedings of the ACM International Multimedia Conference and Exhibition
Pages351-354
Number of pages4
Publication statusPublished - 2003
Externally publishedYes
Event2003 Multimedia Conference - Proceedings of the 11th ACM International Conference on Multimedia, MM'03 - Berkeley, CA.
Duration: 2003 Nov 42003 Nov 6

Other

Other2003 Multimedia Conference - Proceedings of the 11th ACM International Conference on Multimedia, MM'03
CityBerkeley, CA.
Period03/11/403/11/6

Fingerprint

Speech synthesis
Speech recognition
Wire
Network protocols
Communication

Keywords

  • Anthropomorphic Dialog Agent
  • Face Image Synthesis
  • Facial Animation
  • Lip Synchronization

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Yotsukura, T., Morishima, S., & Nakamura, S. (2003). Model-based talking face synthesis for anthropomorphic spoken dialog agent system. In Proceedings of the ACM International Multimedia Conference and Exhibition (pp. 351-354)

Model-based talking face synthesis for anthropomorphic spoken dialog agent system. / Yotsukura, Tatsuo; Morishima, Shigeo; Nakamura, Satoshi.

Proceedings of the ACM International Multimedia Conference and Exhibition. 2003. p. 351-354.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yotsukura, T, Morishima, S & Nakamura, S 2003, Model-based talking face synthesis for anthropomorphic spoken dialog agent system. in Proceedings of the ACM International Multimedia Conference and Exhibition. pp. 351-354, 2003 Multimedia Conference - Proceedings of the 11th ACM International Conference on Multimedia, MM'03, Berkeley, CA., 03/11/4.
Yotsukura T, Morishima S, Nakamura S. Model-based talking face synthesis for anthropomorphic spoken dialog agent system. In Proceedings of the ACM International Multimedia Conference and Exhibition. 2003. p. 351-354
Yotsukura, Tatsuo ; Morishima, Shigeo ; Nakamura, Satoshi. / Model-based talking face synthesis for anthropomorphic spoken dialog agent system. Proceedings of the ACM International Multimedia Conference and Exhibition. 2003. pp. 351-354
@inproceedings{38fcd6846fe644448315bf45bfd93c91,
title = "Model-based talking face synthesis for anthropomorphic spoken dialog agent system",
abstract = "Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols. In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part.",
keywords = "Anthropomorphic Dialog Agent, Face Image Synthesis, Facial Animation, Lip Synchronization",
author = "Tatsuo Yotsukura and Shigeo Morishima and Satoshi Nakamura",
year = "2003",
language = "English",
pages = "351--354",
booktitle = "Proceedings of the ACM International Multimedia Conference and Exhibition",

}

TY - GEN

T1 - Model-based talking face synthesis for anthropomorphic spoken dialog agent system

AU - Yotsukura, Tatsuo

AU - Morishima, Shigeo

AU - Nakamura, Satoshi

PY - 2003

Y1 - 2003

N2 - Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols. In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part.

AB - Towards natural human-machine communication, interface technologies by way of speech and image information have been intensively developed. An anthropomorphic dialog agent is an ideal system, which integrates spoken dialog and natural facial expressions. This paper reports on our project aiming to create a general-purpose toolkit for building an easily customizable anthropomorphic agent. There have been almost no tools so far such as intuitive, easy to understand, fully interactive, and open source. Our anthropomorphic agent is designed to fulfill these requirements. This toolkit consists four modules, multi modal dialog integration, speech recognition, speech synthesis, and face image synthesis. These modules are highly modularized and interlinked by a simple communication protocols. In this paper, we focus on the construction of an agent's face image synthesis. For this part lip movement control synchronous to the speech signal and facial emotion expression are the most important parts. We developed the face image synthesis module (FSM) that only requires one frontal face image, and can be used by any skill level of users. A user's original agent can be generated by easy adjustment of the frontal face image and the generic wire-frame model. The paper describes overall system diagram and specifically the agent's face image synthesis part.

KW - Anthropomorphic Dialog Agent

KW - Face Image Synthesis

KW - Facial Animation

KW - Lip Synchronization

UR - http://www.scopus.com/inward/record.url?scp=2342535715&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2342535715&partnerID=8YFLogxK

M3 - Conference contribution

SP - 351

EP - 354

BT - Proceedings of the ACM International Multimedia Conference and Exhibition

ER -