Automatic generation of photorealistic 3D inner mouth animation only from frontal images

Masahide Kawai, Tomoyori Iwao, Akinobu Maejima, Shigeo Morishima

    Research output: Contribution to journalArticle

    Abstract

    In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker’s teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.

    Original languageEnglish
    Pages (from-to)693-703
    Number of pages11
    JournalJournal of Information Processing
    Volume23
    Issue number5
    DOIs
    Publication statusPublished - 2015 Sep 15

    Fingerprint

    Animation
    Video cameras
    Motion control
    Image processing
    Decomposition

    Keywords

    • Inner mouth
    • Multi-view Detai-lization
    • Phoneme combination
    • Skull bone
    • Speech animation

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    Automatic generation of photorealistic 3D inner mouth animation only from frontal images. / Kawai, Masahide; Iwao, Tomoyori; Maejima, Akinobu; Morishima, Shigeo.

    In: Journal of Information Processing, Vol. 23, No. 5, 15.09.2015, p. 693-703.

    Research output: Contribution to journalArticle

    Kawai, Masahide ; Iwao, Tomoyori ; Maejima, Akinobu ; Morishima, Shigeo. / Automatic generation of photorealistic 3D inner mouth animation only from frontal images. In: Journal of Information Processing. 2015 ; Vol. 23, No. 5. pp. 693-703.
    @article{5a3ef796a91a4062ae5280558162575b,
    title = "Automatic generation of photorealistic 3D inner mouth animation only from frontal images",
    abstract = "In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker’s teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.",
    keywords = "Inner mouth, Multi-view Detai-lization, Phoneme combination, Skull bone, Speech animation",
    author = "Masahide Kawai and Tomoyori Iwao and Akinobu Maejima and Shigeo Morishima",
    year = "2015",
    month = "9",
    day = "15",
    doi = "10.2197/ipsjjip.23.693",
    language = "English",
    volume = "23",
    pages = "693--703",
    journal = "Journal of Information Processing",
    issn = "0387-5806",
    publisher = "Information Processing Society of Japan",
    number = "5",

    }

    TY - JOUR

    T1 - Automatic generation of photorealistic 3D inner mouth animation only from frontal images

    AU - Kawai, Masahide

    AU - Iwao, Tomoyori

    AU - Maejima, Akinobu

    AU - Morishima, Shigeo

    PY - 2015/9/15

    Y1 - 2015/9/15

    N2 - In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker’s teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.

    AB - In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker’s teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.

    KW - Inner mouth

    KW - Multi-view Detai-lization

    KW - Phoneme combination

    KW - Skull bone

    KW - Speech animation

    UR - http://www.scopus.com/inward/record.url?scp=84941551407&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84941551407&partnerID=8YFLogxK

    U2 - 10.2197/ipsjjip.23.693

    DO - 10.2197/ipsjjip.23.693

    M3 - Article

    VL - 23

    SP - 693

    EP - 703

    JO - Journal of Information Processing

    JF - Journal of Information Processing

    SN - 0387-5806

    IS - 5

    ER -