3D human head geometry estimation from a speech

Akinobu Maejima, Shigeo Morishima

    研究成果: Conference contribution

    抄録

    We can visualize acquaintances' appearance by just hearing their voice if we have met them in past few years. Thus, it would appear that some relationships exist in between voice and appearance. If 3D head geometry could be estimated from a voice, we can realize some applications (e.g, avatar generation, character modeling for video game, etc.). Previously, although many researchers have been reported about a relationship between acoustic features of a voice and its corresponding dynamical visual features including lip, tongue, and jaw movements or vocal articulation during a speech, however, there have been few reports about a relationship between acoustic features and static 3D head geometry. In this paper, we focus on estimating 3D head geometry from a voice. Acoustic features vary depending on a speech context and its intonation. Therefore we restrict a context to Japanese 5 vowels. Under this assumption, to estimate 3D head geometry, we use a Feedforward Neural Network (FNN) trained by using a correspondence between an individual acoustic features extracted from a Japanese vowel and 3D head geometry generated based on a 3D range scan. The performance of our method is shown by both closed and open tests. As a result, we found that 3D head geometry which is acoustically similar to an input voice could be estimated under the limited condition.

    元の言語English
    ホスト出版物のタイトルACM SIGGRAPH 2012 Posters, SIGGRAPH'12
    DOI
    出版物ステータスPublished - 2012
    イベントACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH'12 - Los Angeles, CA
    継続期間: 2012 8 52012 8 9

    Other

    OtherACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH'12
    Los Angeles, CA
    期間12/8/512/8/9

    Fingerprint

    Geometry
    Acoustics
    Feedforward neural networks
    Audition
    Speech recognition

    ASJC Scopus subject areas

    • Computer Graphics and Computer-Aided Design
    • Computer Vision and Pattern Recognition
    • Software

    これを引用

    3D human head geometry estimation from a speech. / Maejima, Akinobu; Morishima, Shigeo.

    ACM SIGGRAPH 2012 Posters, SIGGRAPH'12. 2012.

    研究成果: Conference contribution

    Maejima, A & Morishima, S 2012, 3D human head geometry estimation from a speech. : ACM SIGGRAPH 2012 Posters, SIGGRAPH'12. ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH'12, Los Angeles, CA, 12/8/5. https://doi.org/10.1145/2342896.2342997
    Maejima, Akinobu ; Morishima, Shigeo. / 3D human head geometry estimation from a speech. ACM SIGGRAPH 2012 Posters, SIGGRAPH'12. 2012.
    @inproceedings{937754b1927c4f24aa439a62b408d6b4,
    title = "3D human head geometry estimation from a speech",
    abstract = "We can visualize acquaintances' appearance by just hearing their voice if we have met them in past few years. Thus, it would appear that some relationships exist in between voice and appearance. If 3D head geometry could be estimated from a voice, we can realize some applications (e.g, avatar generation, character modeling for video game, etc.). Previously, although many researchers have been reported about a relationship between acoustic features of a voice and its corresponding dynamical visual features including lip, tongue, and jaw movements or vocal articulation during a speech, however, there have been few reports about a relationship between acoustic features and static 3D head geometry. In this paper, we focus on estimating 3D head geometry from a voice. Acoustic features vary depending on a speech context and its intonation. Therefore we restrict a context to Japanese 5 vowels. Under this assumption, to estimate 3D head geometry, we use a Feedforward Neural Network (FNN) trained by using a correspondence between an individual acoustic features extracted from a Japanese vowel and 3D head geometry generated based on a 3D range scan. The performance of our method is shown by both closed and open tests. As a result, we found that 3D head geometry which is acoustically similar to an input voice could be estimated under the limited condition.",
    author = "Akinobu Maejima and Shigeo Morishima",
    year = "2012",
    doi = "10.1145/2342896.2342997",
    language = "English",
    isbn = "9781450316828",
    booktitle = "ACM SIGGRAPH 2012 Posters, SIGGRAPH'12",

    }

    TY - GEN

    T1 - 3D human head geometry estimation from a speech

    AU - Maejima, Akinobu

    AU - Morishima, Shigeo

    PY - 2012

    Y1 - 2012

    N2 - We can visualize acquaintances' appearance by just hearing their voice if we have met them in past few years. Thus, it would appear that some relationships exist in between voice and appearance. If 3D head geometry could be estimated from a voice, we can realize some applications (e.g, avatar generation, character modeling for video game, etc.). Previously, although many researchers have been reported about a relationship between acoustic features of a voice and its corresponding dynamical visual features including lip, tongue, and jaw movements or vocal articulation during a speech, however, there have been few reports about a relationship between acoustic features and static 3D head geometry. In this paper, we focus on estimating 3D head geometry from a voice. Acoustic features vary depending on a speech context and its intonation. Therefore we restrict a context to Japanese 5 vowels. Under this assumption, to estimate 3D head geometry, we use a Feedforward Neural Network (FNN) trained by using a correspondence between an individual acoustic features extracted from a Japanese vowel and 3D head geometry generated based on a 3D range scan. The performance of our method is shown by both closed and open tests. As a result, we found that 3D head geometry which is acoustically similar to an input voice could be estimated under the limited condition.

    AB - We can visualize acquaintances' appearance by just hearing their voice if we have met them in past few years. Thus, it would appear that some relationships exist in between voice and appearance. If 3D head geometry could be estimated from a voice, we can realize some applications (e.g, avatar generation, character modeling for video game, etc.). Previously, although many researchers have been reported about a relationship between acoustic features of a voice and its corresponding dynamical visual features including lip, tongue, and jaw movements or vocal articulation during a speech, however, there have been few reports about a relationship between acoustic features and static 3D head geometry. In this paper, we focus on estimating 3D head geometry from a voice. Acoustic features vary depending on a speech context and its intonation. Therefore we restrict a context to Japanese 5 vowels. Under this assumption, to estimate 3D head geometry, we use a Feedforward Neural Network (FNN) trained by using a correspondence between an individual acoustic features extracted from a Japanese vowel and 3D head geometry generated based on a 3D range scan. The performance of our method is shown by both closed and open tests. As a result, we found that 3D head geometry which is acoustically similar to an input voice could be estimated under the limited condition.

    UR - http://www.scopus.com/inward/record.url?scp=84865628033&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84865628033&partnerID=8YFLogxK

    U2 - 10.1145/2342896.2342997

    DO - 10.1145/2342896.2342997

    M3 - Conference contribution

    AN - SCOPUS:84865628033

    SN - 9781450316828

    BT - ACM SIGGRAPH 2012 Posters, SIGGRAPH'12

    ER -