Subjective evaluation of a synthetic talking face in an acoustically noisy environment

Akinobu Maejima, Tatsuo Yotsukura, Shigeo Morishima, Satoshi Nakamura

    Research output: Contribution to journalArticle

    Abstract

    The realization of an anthropomorphic agent which looks like a real human is an important research topic for the broadening of the range of human-to-human communications through the use of a computer. We have proposed a technique for synthesizing natural talking-face animation that permits such communications. How to evaluate the performance of talking-face animation, however, has remained an outstanding issue. The performance of talking-face animation is determined in three parameters: (1) Does it reproduce human talking to an extent that permits lipreading? (2) Does it appear visually natural? (3) Is it accurately synchronized with voice? In this paper, we first presented talking-face animation along with the voice to subjects and conducted experiments on how well the subjects heard the contents of the spoken words to examine Parameter (1). In the next step, with regard to Parameter (2), the visual naturalness of the talking-face animation and the smoothness of the motion of the talking mouth were evaluated on a scale of 5 points. Lastly, with regard to Parameter (3), talking-face animation in which the synchronization of the animation with sound was off by a fixed interval was shown to subjects to investigate the subjective perception of the synchronization gap, and the extent of the resulting strange feeling was evaluated on a scale of 5 points. In addition, the effect of the synchronization gap between voice and talking-face animation on the manner in which the spoken words are understood was also evaluated. Through these evaluation experiments, the quality of synthetic talking-face animation proposed by the authors was evaluated, and we studied naturally-appearing synchronization between synthetic talking-face animation and voice.

    Original languageEnglish
    Pages (from-to)39-52
    Number of pages14
    JournalElectronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi)
    Volume89
    Issue number5
    DOIs
    Publication statusPublished - 2006 May

    Fingerprint

    Animation
    Synchronization
    Communication
    Experiments
    Acoustic waves

    Keywords

    • Naturalness
    • Noisy environment
    • Reproducibility
    • Synthetic talking-face animation
    • Voice synchronization

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering

    Cite this

    @article{78622959d5fb44efa687383d04a6b3c0,
    title = "Subjective evaluation of a synthetic talking face in an acoustically noisy environment",
    abstract = "The realization of an anthropomorphic agent which looks like a real human is an important research topic for the broadening of the range of human-to-human communications through the use of a computer. We have proposed a technique for synthesizing natural talking-face animation that permits such communications. How to evaluate the performance of talking-face animation, however, has remained an outstanding issue. The performance of talking-face animation is determined in three parameters: (1) Does it reproduce human talking to an extent that permits lipreading? (2) Does it appear visually natural? (3) Is it accurately synchronized with voice? In this paper, we first presented talking-face animation along with the voice to subjects and conducted experiments on how well the subjects heard the contents of the spoken words to examine Parameter (1). In the next step, with regard to Parameter (2), the visual naturalness of the talking-face animation and the smoothness of the motion of the talking mouth were evaluated on a scale of 5 points. Lastly, with regard to Parameter (3), talking-face animation in which the synchronization of the animation with sound was off by a fixed interval was shown to subjects to investigate the subjective perception of the synchronization gap, and the extent of the resulting strange feeling was evaluated on a scale of 5 points. In addition, the effect of the synchronization gap between voice and talking-face animation on the manner in which the spoken words are understood was also evaluated. Through these evaluation experiments, the quality of synthetic talking-face animation proposed by the authors was evaluated, and we studied naturally-appearing synchronization between synthetic talking-face animation and voice.",
    keywords = "Naturalness, Noisy environment, Reproducibility, Synthetic talking-face animation, Voice synchronization",
    author = "Akinobu Maejima and Tatsuo Yotsukura and Shigeo Morishima and Satoshi Nakamura",
    year = "2006",
    month = "5",
    doi = "10.1002/ecjc.20180",
    language = "English",
    volume = "89",
    pages = "39--52",
    journal = "Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi)",
    issn = "1042-0967",
    publisher = "John Wiley and Sons Inc.",
    number = "5",

    }

    TY - JOUR

    T1 - Subjective evaluation of a synthetic talking face in an acoustically noisy environment

    AU - Maejima, Akinobu

    AU - Yotsukura, Tatsuo

    AU - Morishima, Shigeo

    AU - Nakamura, Satoshi

    PY - 2006/5

    Y1 - 2006/5

    N2 - The realization of an anthropomorphic agent which looks like a real human is an important research topic for the broadening of the range of human-to-human communications through the use of a computer. We have proposed a technique for synthesizing natural talking-face animation that permits such communications. How to evaluate the performance of talking-face animation, however, has remained an outstanding issue. The performance of talking-face animation is determined in three parameters: (1) Does it reproduce human talking to an extent that permits lipreading? (2) Does it appear visually natural? (3) Is it accurately synchronized with voice? In this paper, we first presented talking-face animation along with the voice to subjects and conducted experiments on how well the subjects heard the contents of the spoken words to examine Parameter (1). In the next step, with regard to Parameter (2), the visual naturalness of the talking-face animation and the smoothness of the motion of the talking mouth were evaluated on a scale of 5 points. Lastly, with regard to Parameter (3), talking-face animation in which the synchronization of the animation with sound was off by a fixed interval was shown to subjects to investigate the subjective perception of the synchronization gap, and the extent of the resulting strange feeling was evaluated on a scale of 5 points. In addition, the effect of the synchronization gap between voice and talking-face animation on the manner in which the spoken words are understood was also evaluated. Through these evaluation experiments, the quality of synthetic talking-face animation proposed by the authors was evaluated, and we studied naturally-appearing synchronization between synthetic talking-face animation and voice.

    AB - The realization of an anthropomorphic agent which looks like a real human is an important research topic for the broadening of the range of human-to-human communications through the use of a computer. We have proposed a technique for synthesizing natural talking-face animation that permits such communications. How to evaluate the performance of talking-face animation, however, has remained an outstanding issue. The performance of talking-face animation is determined in three parameters: (1) Does it reproduce human talking to an extent that permits lipreading? (2) Does it appear visually natural? (3) Is it accurately synchronized with voice? In this paper, we first presented talking-face animation along with the voice to subjects and conducted experiments on how well the subjects heard the contents of the spoken words to examine Parameter (1). In the next step, with regard to Parameter (2), the visual naturalness of the talking-face animation and the smoothness of the motion of the talking mouth were evaluated on a scale of 5 points. Lastly, with regard to Parameter (3), talking-face animation in which the synchronization of the animation with sound was off by a fixed interval was shown to subjects to investigate the subjective perception of the synchronization gap, and the extent of the resulting strange feeling was evaluated on a scale of 5 points. In addition, the effect of the synchronization gap between voice and talking-face animation on the manner in which the spoken words are understood was also evaluated. Through these evaluation experiments, the quality of synthetic talking-face animation proposed by the authors was evaluated, and we studied naturally-appearing synchronization between synthetic talking-face animation and voice.

    KW - Naturalness

    KW - Noisy environment

    KW - Reproducibility

    KW - Synthetic talking-face animation

    KW - Voice synchronization

    UR - http://www.scopus.com/inward/record.url?scp=32344438430&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=32344438430&partnerID=8YFLogxK

    U2 - 10.1002/ecjc.20180

    DO - 10.1002/ecjc.20180

    M3 - Article

    VL - 89

    SP - 39

    EP - 52

    JO - Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi)

    JF - Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi)

    SN - 1042-0967

    IS - 5

    ER -