Recognition of para-linguistic information and its application to spoken dialogue system

Shinya Fujie, Yasushi Ejiri, Yosuke Matsusaka, Hideaki Kikuchi, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    10 Citations (Scopus)

    Abstract

    The human-human interactions in a spoken dialogue seem to use not only linguistic information in the utterances but also some sorts of additional information supporting linguistic information. We call these sorts of additional information "para-linguistic information". In this paper, we present a recognition method of attitudes by prosodic information, and a recognition method of head gestures. In the former method, in order to recognize two attitudes, such as "positive" and "negative", F0 pattern and phoneme alignment are introduced as features. In the latter method, in order to recognize three gestures, such as "nod", "tilt" and "shake", left-to-right HMM is introduced as the probabilistic model as well as optical flow is introduced as features. Experiment results show that these methods are sufficient to recognize user's attitude as para-linguistic information. Finally, we show a proto-type spoken dialogue system using para-linguistic information and how these sorts of information contribute the efficient conversation.

    Original languageEnglish
    Title of host publication2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages231-236
    Number of pages6
    ISBN (Print)0780379802, 9780780379800
    DOIs
    Publication statusPublished - 2003
    EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 - St. Thomas, United States
    Duration: 2003 Nov 302003 Dec 4

    Other

    OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
    CountryUnited States
    CitySt. Thomas
    Period03/11/3003/12/4

    Fingerprint

    Linguistics
    Optical flows
    Experiments

    ASJC Scopus subject areas

    • Signal Processing
    • Computer Vision and Pattern Recognition
    • Computer Science Applications

    Cite this

    Fujie, S., Ejiri, Y., Matsusaka, Y., Kikuchi, H., & Kobayashi, T. (2003). Recognition of para-linguistic information and its application to spoken dialogue system. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 (pp. 231-236). [1318446] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2003.1318446

    Recognition of para-linguistic information and its application to spoken dialogue system. / Fujie, Shinya; Ejiri, Yasushi; Matsusaka, Yosuke; Kikuchi, Hideaki; Kobayashi, Tetsunori.

    2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc., 2003. p. 231-236 1318446.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Fujie, S, Ejiri, Y, Matsusaka, Y, Kikuchi, H & Kobayashi, T 2003, Recognition of para-linguistic information and its application to spoken dialogue system. in 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003., 1318446, Institute of Electrical and Electronics Engineers Inc., pp. 231-236, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, St. Thomas, United States, 03/11/30. https://doi.org/10.1109/ASRU.2003.1318446
    Fujie S, Ejiri Y, Matsusaka Y, Kikuchi H, Kobayashi T. Recognition of para-linguistic information and its application to spoken dialogue system. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc. 2003. p. 231-236. 1318446 https://doi.org/10.1109/ASRU.2003.1318446
    Fujie, Shinya ; Ejiri, Yasushi ; Matsusaka, Yosuke ; Kikuchi, Hideaki ; Kobayashi, Tetsunori. / Recognition of para-linguistic information and its application to spoken dialogue system. 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc., 2003. pp. 231-236
    @inproceedings{c6f89f3fb55943948e10623d95dbde5a,
    title = "Recognition of para-linguistic information and its application to spoken dialogue system",
    abstract = "The human-human interactions in a spoken dialogue seem to use not only linguistic information in the utterances but also some sorts of additional information supporting linguistic information. We call these sorts of additional information {"}para-linguistic information{"}. In this paper, we present a recognition method of attitudes by prosodic information, and a recognition method of head gestures. In the former method, in order to recognize two attitudes, such as {"}positive{"} and {"}negative{"}, F0 pattern and phoneme alignment are introduced as features. In the latter method, in order to recognize three gestures, such as {"}nod{"}, {"}tilt{"} and {"}shake{"}, left-to-right HMM is introduced as the probabilistic model as well as optical flow is introduced as features. Experiment results show that these methods are sufficient to recognize user's attitude as para-linguistic information. Finally, we show a proto-type spoken dialogue system using para-linguistic information and how these sorts of information contribute the efficient conversation.",
    author = "Shinya Fujie and Yasushi Ejiri and Yosuke Matsusaka and Hideaki Kikuchi and Tetsunori Kobayashi",
    year = "2003",
    doi = "10.1109/ASRU.2003.1318446",
    language = "English",
    isbn = "0780379802",
    pages = "231--236",
    booktitle = "2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",

    }

    TY - GEN

    T1 - Recognition of para-linguistic information and its application to spoken dialogue system

    AU - Fujie, Shinya

    AU - Ejiri, Yasushi

    AU - Matsusaka, Yosuke

    AU - Kikuchi, Hideaki

    AU - Kobayashi, Tetsunori

    PY - 2003

    Y1 - 2003

    N2 - The human-human interactions in a spoken dialogue seem to use not only linguistic information in the utterances but also some sorts of additional information supporting linguistic information. We call these sorts of additional information "para-linguistic information". In this paper, we present a recognition method of attitudes by prosodic information, and a recognition method of head gestures. In the former method, in order to recognize two attitudes, such as "positive" and "negative", F0 pattern and phoneme alignment are introduced as features. In the latter method, in order to recognize three gestures, such as "nod", "tilt" and "shake", left-to-right HMM is introduced as the probabilistic model as well as optical flow is introduced as features. Experiment results show that these methods are sufficient to recognize user's attitude as para-linguistic information. Finally, we show a proto-type spoken dialogue system using para-linguistic information and how these sorts of information contribute the efficient conversation.

    AB - The human-human interactions in a spoken dialogue seem to use not only linguistic information in the utterances but also some sorts of additional information supporting linguistic information. We call these sorts of additional information "para-linguistic information". In this paper, we present a recognition method of attitudes by prosodic information, and a recognition method of head gestures. In the former method, in order to recognize two attitudes, such as "positive" and "negative", F0 pattern and phoneme alignment are introduced as features. In the latter method, in order to recognize three gestures, such as "nod", "tilt" and "shake", left-to-right HMM is introduced as the probabilistic model as well as optical flow is introduced as features. Experiment results show that these methods are sufficient to recognize user's attitude as para-linguistic information. Finally, we show a proto-type spoken dialogue system using para-linguistic information and how these sorts of information contribute the efficient conversation.

    UR - http://www.scopus.com/inward/record.url?scp=33745180348&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33745180348&partnerID=8YFLogxK

    U2 - 10.1109/ASRU.2003.1318446

    DO - 10.1109/ASRU.2003.1318446

    M3 - Conference contribution

    AN - SCOPUS:33745180348

    SN - 0780379802

    SN - 9780780379800

    SP - 231

    EP - 236

    BT - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -