Automatic indexing of multimedia content by integration of audio, spoken language, and visual information

Katsutoshi Ohtsuki, Katsuji Bessho, Yoshihiro Matsuo, Shoichi Matsunaga, Yoshihiko Hayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes an automatic multimedia content indexing system that includes acoustic segmentation, automatic speech recognition, topic segmentation, and video indexing features. The system is intended for indexing of multimedia news programs. Speech segments extracted from news content are delivered to the speech recognition module. The speech recognition result is segmented into topics using a segmentation algorithm based on word conceptual vectors. The indexing results derived from audio and speech information are integrated with video indexing results to extract the story structure. Experimental results show that topic segmentation using word conceptual vectors is superior to the conventional method using local word co-occurrence frequencies, and that the integrated segmentation provides better news story structures than would be possible with any single type of information.

Original languageEnglish
Title of host publication2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages601-606
Number of pages6
ISBN (Electronic)0780379802, 9780780379800
DOIs
Publication statusPublished - 2003
Externally publishedYes
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 - St. Thomas, United States
Duration: 2003 Nov 302003 Dec 4

Publication series

Name2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
CountryUnited States
CitySt. Thomas
Period03/11/3003/12/4

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Automatic indexing of multimedia content by integration of audio, spoken language, and visual information'. Together they form a unique fingerprint.

  • Cite this

    Ohtsuki, K., Bessho, K., Matsuo, Y., Matsunaga, S., & Hayashi, Y. (2003). Automatic indexing of multimedia content by integration of audio, spoken language, and visual information. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 (pp. 601-606). [1318508] (2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2003.1318508