Recognition of positive/negative attitude and its application to a spoken dialogue system

Shinya Fujie, Yasushi Ejiri, Hideaki Kikuchi, Tetsunori Kobayashi

    Research output: Contribution to journalArticle

    5 Citations (Scopus)

    Abstract

    Dialogue between human subjects by voice is based on linguistic information contained in the utterance. In addition, the psychological state of the utterer and information complementing the dialogue are represented by prosody, facial expression, and head movement, making the dialogue proceed smoothly. Such information, which co-occurs with the utterance and supports the smooth transmission of linguistic information, is called paralinguistic information. This paper considers the attitude of the utterer as represented by prosody and head gestures as paralinguistic information. Methods of recognizing such respective information are proposed, and a dialogue robot is realized on the basis of the proposed method. In the recognition of the utterance attitude by prosody, the positive or negative attitude of the utterance is recognized on the basis of F0 pattern and the phoneme duration. In the recognition of head gestures, a nod is defined as representing a positive attitude and a tilt or shake of the head as representing a negative attitude. These three motions are recognized with the optical flow as the feature parameters, using HMM as a stochastic model. It is shown experimentally that the proposed method achieves the same recognition ability as humans. It is also shown that a dialogue robot incorporating the proposed method achieves a rhythmic, efficient dialogue, which has not been the case in the past.

    Original languageEnglish
    Pages (from-to)45-55
    Number of pages11
    JournalSystems and Computers in Japan
    Volume37
    Issue number12
    DOIs
    Publication statusPublished - 2006 Nov 15

    Fingerprint

    Spoken Dialogue Systems
    Linguistics
    Robots
    Optical flows
    Prosody
    Stochastic models
    Gesture
    Robot
    Facial Expression
    Optical Flow
    Tilt
    Stochastic Model
    Dialogue
    Motion

    Keywords

    • Head gesture
    • Paralinguistic information
    • Prosody
    • Spoken dialogue

    ASJC Scopus subject areas

    • Hardware and Architecture
    • Information Systems
    • Theoretical Computer Science
    • Computational Theory and Mathematics

    Cite this

    Recognition of positive/negative attitude and its application to a spoken dialogue system. / Fujie, Shinya; Ejiri, Yasushi; Kikuchi, Hideaki; Kobayashi, Tetsunori.

    In: Systems and Computers in Japan, Vol. 37, No. 12, 15.11.2006, p. 45-55.

    Research output: Contribution to journalArticle

    @article{7dea406dc06449ee857f55088042e138,
    title = "Recognition of positive/negative attitude and its application to a spoken dialogue system",
    abstract = "Dialogue between human subjects by voice is based on linguistic information contained in the utterance. In addition, the psychological state of the utterer and information complementing the dialogue are represented by prosody, facial expression, and head movement, making the dialogue proceed smoothly. Such information, which co-occurs with the utterance and supports the smooth transmission of linguistic information, is called paralinguistic information. This paper considers the attitude of the utterer as represented by prosody and head gestures as paralinguistic information. Methods of recognizing such respective information are proposed, and a dialogue robot is realized on the basis of the proposed method. In the recognition of the utterance attitude by prosody, the positive or negative attitude of the utterance is recognized on the basis of F0 pattern and the phoneme duration. In the recognition of head gestures, a nod is defined as representing a positive attitude and a tilt or shake of the head as representing a negative attitude. These three motions are recognized with the optical flow as the feature parameters, using HMM as a stochastic model. It is shown experimentally that the proposed method achieves the same recognition ability as humans. It is also shown that a dialogue robot incorporating the proposed method achieves a rhythmic, efficient dialogue, which has not been the case in the past.",
    keywords = "Head gesture, Paralinguistic information, Prosody, Spoken dialogue",
    author = "Shinya Fujie and Yasushi Ejiri and Hideaki Kikuchi and Tetsunori Kobayashi",
    year = "2006",
    month = "11",
    day = "15",
    doi = "10.1002/scj.20508",
    language = "English",
    volume = "37",
    pages = "45--55",
    journal = "Systems and Computers in Japan",
    issn = "0882-1666",
    publisher = "John Wiley and Sons Inc.",
    number = "12",

    }

    TY - JOUR

    T1 - Recognition of positive/negative attitude and its application to a spoken dialogue system

    AU - Fujie, Shinya

    AU - Ejiri, Yasushi

    AU - Kikuchi, Hideaki

    AU - Kobayashi, Tetsunori

    PY - 2006/11/15

    Y1 - 2006/11/15

    N2 - Dialogue between human subjects by voice is based on linguistic information contained in the utterance. In addition, the psychological state of the utterer and information complementing the dialogue are represented by prosody, facial expression, and head movement, making the dialogue proceed smoothly. Such information, which co-occurs with the utterance and supports the smooth transmission of linguistic information, is called paralinguistic information. This paper considers the attitude of the utterer as represented by prosody and head gestures as paralinguistic information. Methods of recognizing such respective information are proposed, and a dialogue robot is realized on the basis of the proposed method. In the recognition of the utterance attitude by prosody, the positive or negative attitude of the utterance is recognized on the basis of F0 pattern and the phoneme duration. In the recognition of head gestures, a nod is defined as representing a positive attitude and a tilt or shake of the head as representing a negative attitude. These three motions are recognized with the optical flow as the feature parameters, using HMM as a stochastic model. It is shown experimentally that the proposed method achieves the same recognition ability as humans. It is also shown that a dialogue robot incorporating the proposed method achieves a rhythmic, efficient dialogue, which has not been the case in the past.

    AB - Dialogue between human subjects by voice is based on linguistic information contained in the utterance. In addition, the psychological state of the utterer and information complementing the dialogue are represented by prosody, facial expression, and head movement, making the dialogue proceed smoothly. Such information, which co-occurs with the utterance and supports the smooth transmission of linguistic information, is called paralinguistic information. This paper considers the attitude of the utterer as represented by prosody and head gestures as paralinguistic information. Methods of recognizing such respective information are proposed, and a dialogue robot is realized on the basis of the proposed method. In the recognition of the utterance attitude by prosody, the positive or negative attitude of the utterance is recognized on the basis of F0 pattern and the phoneme duration. In the recognition of head gestures, a nod is defined as representing a positive attitude and a tilt or shake of the head as representing a negative attitude. These three motions are recognized with the optical flow as the feature parameters, using HMM as a stochastic model. It is shown experimentally that the proposed method achieves the same recognition ability as humans. It is also shown that a dialogue robot incorporating the proposed method achieves a rhythmic, efficient dialogue, which has not been the case in the past.

    KW - Head gesture

    KW - Paralinguistic information

    KW - Prosody

    KW - Spoken dialogue

    UR - http://www.scopus.com/inward/record.url?scp=33750145461&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33750145461&partnerID=8YFLogxK

    U2 - 10.1002/scj.20508

    DO - 10.1002/scj.20508

    M3 - Article

    VL - 37

    SP - 45

    EP - 55

    JO - Systems and Computers in Japan

    JF - Systems and Computers in Japan

    SN - 0882-1666

    IS - 12

    ER -