Efficient speech animation synthesis with vocalic lip shapes

Daisuke Mima, Akinobu Maejima, Shigeo Morishima

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Computer-generated speech animations are commonly seen in video games and movies. Although high-quality facial motions can be created by the hand crafted work of skilled artists, this approach is not always suitable because of time and cost constraints. A data-driven approach [Taylor et al. 2012], such as machine learning to concatenate video portions of speech training data, has been utilized to generate natural speech animation, while a large number of target shapes are often required for synthesis. We can obtain smooth mouth motions from prepared lip shapes for typical vowels by using an interpolation of lip shapes with Gaussian mixture models (GMMs) [Yano et al. 2007]. However, the resulting animation is not directly generated from the measured lip motions of someone's actual speech.

    Original languageEnglish
    Title of host publicationACM SIGGRAPH 2013 Posters, SIGGRAPH 2013
    DOIs
    Publication statusPublished - 2013
    EventACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2013 - Anaheim, CA
    Duration: 2013 Jul 212013 Jul 25

    Other

    OtherACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2013
    CityAnaheim, CA
    Period13/7/2113/7/25

    Fingerprint

    Animation
    Learning systems
    Interpolation
    Costs

    ASJC Scopus subject areas

    • Computer Graphics and Computer-Aided Design
    • Computer Vision and Pattern Recognition
    • Software

    Cite this

    Mima, D., Maejima, A., & Morishima, S. (2013). Efficient speech animation synthesis with vocalic lip shapes. In ACM SIGGRAPH 2013 Posters, SIGGRAPH 2013 [2] https://doi.org/10.1145/2503385.2503388

    Efficient speech animation synthesis with vocalic lip shapes. / Mima, Daisuke; Maejima, Akinobu; Morishima, Shigeo.

    ACM SIGGRAPH 2013 Posters, SIGGRAPH 2013. 2013. 2.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Mima, D, Maejima, A & Morishima, S 2013, Efficient speech animation synthesis with vocalic lip shapes. in ACM SIGGRAPH 2013 Posters, SIGGRAPH 2013., 2, ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2013, Anaheim, CA, 13/7/21. https://doi.org/10.1145/2503385.2503388
    Mima, Daisuke ; Maejima, Akinobu ; Morishima, Shigeo. / Efficient speech animation synthesis with vocalic lip shapes. ACM SIGGRAPH 2013 Posters, SIGGRAPH 2013. 2013.
    @inproceedings{da6833505200426284b0bf1351e49a32,
    title = "Efficient speech animation synthesis with vocalic lip shapes",
    abstract = "Computer-generated speech animations are commonly seen in video games and movies. Although high-quality facial motions can be created by the hand crafted work of skilled artists, this approach is not always suitable because of time and cost constraints. A data-driven approach [Taylor et al. 2012], such as machine learning to concatenate video portions of speech training data, has been utilized to generate natural speech animation, while a large number of target shapes are often required for synthesis. We can obtain smooth mouth motions from prepared lip shapes for typical vowels by using an interpolation of lip shapes with Gaussian mixture models (GMMs) [Yano et al. 2007]. However, the resulting animation is not directly generated from the measured lip motions of someone's actual speech.",
    author = "Daisuke Mima and Akinobu Maejima and Shigeo Morishima",
    year = "2013",
    doi = "10.1145/2503385.2503388",
    language = "English",
    isbn = "9781450323420",
    booktitle = "ACM SIGGRAPH 2013 Posters, SIGGRAPH 2013",

    }

    TY - GEN

    T1 - Efficient speech animation synthesis with vocalic lip shapes

    AU - Mima, Daisuke

    AU - Maejima, Akinobu

    AU - Morishima, Shigeo

    PY - 2013

    Y1 - 2013

    N2 - Computer-generated speech animations are commonly seen in video games and movies. Although high-quality facial motions can be created by the hand crafted work of skilled artists, this approach is not always suitable because of time and cost constraints. A data-driven approach [Taylor et al. 2012], such as machine learning to concatenate video portions of speech training data, has been utilized to generate natural speech animation, while a large number of target shapes are often required for synthesis. We can obtain smooth mouth motions from prepared lip shapes for typical vowels by using an interpolation of lip shapes with Gaussian mixture models (GMMs) [Yano et al. 2007]. However, the resulting animation is not directly generated from the measured lip motions of someone's actual speech.

    AB - Computer-generated speech animations are commonly seen in video games and movies. Although high-quality facial motions can be created by the hand crafted work of skilled artists, this approach is not always suitable because of time and cost constraints. A data-driven approach [Taylor et al. 2012], such as machine learning to concatenate video portions of speech training data, has been utilized to generate natural speech animation, while a large number of target shapes are often required for synthesis. We can obtain smooth mouth motions from prepared lip shapes for typical vowels by using an interpolation of lip shapes with Gaussian mixture models (GMMs) [Yano et al. 2007]. However, the resulting animation is not directly generated from the measured lip motions of someone's actual speech.

    UR - http://www.scopus.com/inward/record.url?scp=84881589143&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84881589143&partnerID=8YFLogxK

    U2 - 10.1145/2503385.2503388

    DO - 10.1145/2503385.2503388

    M3 - Conference contribution

    AN - SCOPUS:84881589143

    SN - 9781450323420

    BT - ACM SIGGRAPH 2013 Posters, SIGGRAPH 2013

    ER -