Improving semantic video indexing

Efforts in Waseda TRECVID 2015 SIN system

Kazuya Ueki, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.

    Original languageEnglish
    Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1184-1188
    Number of pages5
    Volume2016-May
    ISBN (Electronic)9781479999880
    DOIs
    Publication statusPublished - 2016 May 18
    Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
    Duration: 2016 Mar 202016 Mar 25

    Other

    Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
    CountryChina
    CityShanghai
    Period16/3/2016/3/25

    Fingerprint

    Semantics
    Neural networks
    Classifiers
    Testing

    Keywords

    • CNN
    • generic object recognition
    • Semantic video indexing
    • TRECVID
    • video search

    ASJC Scopus subject areas

    • Signal Processing
    • Software
    • Electrical and Electronic Engineering

    Cite this

    Ueki, K., & Kobayashi, T. (2016). Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 1184-1188). [7471863] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7471863

    Improving semantic video indexing : Efforts in Waseda TRECVID 2015 SIN system. / Ueki, Kazuya; Kobayashi, Tetsunori.

    2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 1184-1188 7471863.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Ueki, K & Kobayashi, T 2016, Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7471863, Institute of Electrical and Electronics Engineers Inc., pp. 1184-1188, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 16/3/20. https://doi.org/10.1109/ICASSP.2016.7471863
    Ueki K, Kobayashi T. Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 1184-1188. 7471863 https://doi.org/10.1109/ICASSP.2016.7471863
    Ueki, Kazuya ; Kobayashi, Tetsunori. / Improving semantic video indexing : Efforts in Waseda TRECVID 2015 SIN system. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 1184-1188
    @inproceedings{022dc7a8cd7e4d66805f56f6ad4600ae,
    title = "Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system",
    abstract = "In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.",
    keywords = "CNN, generic object recognition, Semantic video indexing, TRECVID, video search",
    author = "Kazuya Ueki and Tetsunori Kobayashi",
    year = "2016",
    month = "5",
    day = "18",
    doi = "10.1109/ICASSP.2016.7471863",
    language = "English",
    volume = "2016-May",
    pages = "1184--1188",
    booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",
    address = "United States",

    }

    TY - GEN

    T1 - Improving semantic video indexing

    T2 - Efforts in Waseda TRECVID 2015 SIN system

    AU - Ueki, Kazuya

    AU - Kobayashi, Tetsunori

    PY - 2016/5/18

    Y1 - 2016/5/18

    N2 - In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.

    AB - In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.

    KW - CNN

    KW - generic object recognition

    KW - Semantic video indexing

    KW - TRECVID

    KW - video search

    UR - http://www.scopus.com/inward/record.url?scp=84973344429&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84973344429&partnerID=8YFLogxK

    U2 - 10.1109/ICASSP.2016.7471863

    DO - 10.1109/ICASSP.2016.7471863

    M3 - Conference contribution

    VL - 2016-May

    SP - 1184

    EP - 1188

    BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -