Song2Face: Synthesizing Singing Facial Animation from Audio

Shohei Iwase, Takuya Kato, Shugo Yamaguchi, Tsuchiya Yukitaka, Shigeo Morishima

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We present Song2Face, a deep neural network capable of producing singing facial animation from an input of singing voice and singer label. The network architecture is built upon our insight that, although facial expression when singing varies between different individuals, singing voices store valuable information such as pitch, breathe, and vibrato that expressions may be attributed to. Therefore, our network consists of an encoder that extracts relevant vocal features from audio, and a regression network conditioned on a singer label that predicts control parameters for facial animation. In contrast to prior audio-driven speech animation methods which initially map audio to text-level features, we show that vocal features can be directly learned from singing voice without any explicit constraints. Our network is capable of producing movements for all parts of the face and also rotational movement of the head itself. Furthermore, stylistic differences in expression between different singers are captured via the singer label, and thus the resulting animations singing style can be manipulated at test time.

Original languageEnglish
Title of host publicationSIGGRAPH Asia 2020 Technical Communications, SA 2020
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450380805
DOIs
Publication statusPublished - 2020 Dec 1
EventSIGGRAPH Asia 2020 Technical Communications - International Conference on Computer Graphics and Interactive Techniques, SA 2020 - Virtual, Online, Korea, Republic of
Duration: 2020 Dec 42020 Dec 13

Publication series

NameSIGGRAPH Asia 2020 Technical Communications, SA 2020

Conference

ConferenceSIGGRAPH Asia 2020 Technical Communications - International Conference on Computer Graphics and Interactive Techniques, SA 2020
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period20/12/420/12/13

Keywords

  • Facial Animation
  • Machine Learning
  • Singing Audio

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Song2Face: Synthesizing Singing Facial Animation from Audio'. Together they form a unique fingerprint.

Cite this