Classification of video shots based on human affect

Kok Meng Ong, Wataru Kameyama

    Research output: Contribution to journalArticle

    4 Citations (Scopus)

    Abstract

    This study addresses the challenge of analyzing affective video content. The affective content of a given video is defined as the intensity and the type of emotion that arise in a viewer while watching that video. In this study, human emotion was monitored by capturing viewers' pupil sizes and gazing points while they were watching the video. On the basis of the measurement values, four features were extracted (namely cumulative pupil response (CPR), frequency component (FC), modified bivariate contour ellipse area (mBVCEA) and Gini coefficient). Using principal component analysis, we have found that two key features, namely the CPR and FC, contribute to the majority of variance in the data. By utilizing the key features, the affective content was identified and could be used in classifying the video shots into their respective scenes. An average classification accuracy of 71.89% was achieved for three basic emotions, with the individual maximum classification accuracy at 89.06%. The development in this study serves as the first step in automating personalized video content analysis on the basis of human emotion.

    Original languageEnglish
    Pages (from-to)847-856
    Number of pages10
    JournalKyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers
    Volume63
    Issue number6
    DOIs
    Publication statusPublished - 2009

    Fingerprint

    Principal component analysis
    Frequency response

    Keywords

    • Emotion
    • Gaze
    • Pupil
    • Video content classification

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Media Technology
    • Computer Science Applications

    Cite this

    @article{6c1638fbf599439d8e61aa28094b8ce6,
    title = "Classification of video shots based on human affect",
    abstract = "This study addresses the challenge of analyzing affective video content. The affective content of a given video is defined as the intensity and the type of emotion that arise in a viewer while watching that video. In this study, human emotion was monitored by capturing viewers' pupil sizes and gazing points while they were watching the video. On the basis of the measurement values, four features were extracted (namely cumulative pupil response (CPR), frequency component (FC), modified bivariate contour ellipse area (mBVCEA) and Gini coefficient). Using principal component analysis, we have found that two key features, namely the CPR and FC, contribute to the majority of variance in the data. By utilizing the key features, the affective content was identified and could be used in classifying the video shots into their respective scenes. An average classification accuracy of 71.89{\%} was achieved for three basic emotions, with the individual maximum classification accuracy at 89.06{\%}. The development in this study serves as the first step in automating personalized video content analysis on the basis of human emotion.",
    keywords = "Emotion, Gaze, Pupil, Video content classification",
    author = "Ong, {Kok Meng} and Wataru Kameyama",
    year = "2009",
    doi = "10.3169/itej.63.847",
    language = "English",
    volume = "63",
    pages = "847--856",
    journal = "Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers",
    issn = "1342-6907",
    publisher = "Institute of Image Information and Television Engineers",
    number = "6",

    }

    TY - JOUR

    T1 - Classification of video shots based on human affect

    AU - Ong, Kok Meng

    AU - Kameyama, Wataru

    PY - 2009

    Y1 - 2009

    N2 - This study addresses the challenge of analyzing affective video content. The affective content of a given video is defined as the intensity and the type of emotion that arise in a viewer while watching that video. In this study, human emotion was monitored by capturing viewers' pupil sizes and gazing points while they were watching the video. On the basis of the measurement values, four features were extracted (namely cumulative pupil response (CPR), frequency component (FC), modified bivariate contour ellipse area (mBVCEA) and Gini coefficient). Using principal component analysis, we have found that two key features, namely the CPR and FC, contribute to the majority of variance in the data. By utilizing the key features, the affective content was identified and could be used in classifying the video shots into their respective scenes. An average classification accuracy of 71.89% was achieved for three basic emotions, with the individual maximum classification accuracy at 89.06%. The development in this study serves as the first step in automating personalized video content analysis on the basis of human emotion.

    AB - This study addresses the challenge of analyzing affective video content. The affective content of a given video is defined as the intensity and the type of emotion that arise in a viewer while watching that video. In this study, human emotion was monitored by capturing viewers' pupil sizes and gazing points while they were watching the video. On the basis of the measurement values, four features were extracted (namely cumulative pupil response (CPR), frequency component (FC), modified bivariate contour ellipse area (mBVCEA) and Gini coefficient). Using principal component analysis, we have found that two key features, namely the CPR and FC, contribute to the majority of variance in the data. By utilizing the key features, the affective content was identified and could be used in classifying the video shots into their respective scenes. An average classification accuracy of 71.89% was achieved for three basic emotions, with the individual maximum classification accuracy at 89.06%. The development in this study serves as the first step in automating personalized video content analysis on the basis of human emotion.

    KW - Emotion

    KW - Gaze

    KW - Pupil

    KW - Video content classification

    UR - http://www.scopus.com/inward/record.url?scp=69849088917&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=69849088917&partnerID=8YFLogxK

    U2 - 10.3169/itej.63.847

    DO - 10.3169/itej.63.847

    M3 - Article

    VL - 63

    SP - 847

    EP - 856

    JO - Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers

    JF - Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers

    SN - 1342-6907

    IS - 6

    ER -