Automatic metadata generation and video editing based on speech and image recognition for medical education contents

Satoshi Tamura*, Koji Hashimoto, Zhu Jiong, Satoru Hayamizu, Hirotsugu Asai, Hideki Tanahashi, Makoto Kanagawa

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper reports a metadata generation system as well as an automatic video edit system. The metadata are information described about the other data. In the audio metadata generation system, speech recognition using general language model (LM) and specialized LM is performed to input speech in order to obtain segment (event group) and audio metadata (event information) respectively. In the video edit system, visual metadata obtained by image recognition and audio metadata are combined into audio-visual metadata. Subsequently, multiple videos are edited to one video using the audio-visual metadata. Experiments were conducted to evaluate event detection of the systems using medical education contents, ACLS and BLS. The audio metadata system achieved about a 78% event detection correctness. In the edit system, an 87% event correctness was obtained by audio-visual metadata, and the survey proved that the edited video is appropriate and useful.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages2466-2469
Number of pages4
ISBN (Print)9781604234497
Publication statusPublished - 2006
Externally publishedYes
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 2006 Sept 172006 Sept 21

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume5

Conference

ConferenceINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/TerritoryUnited States
CityPittsburgh, PA
Period06/9/1706/9/21

Keywords

  • Audio-visual integration
  • Automatic video edit
  • Metadata
  • Speech recognition

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Automatic metadata generation and video editing based on speech and image recognition for medical education contents'. Together they form a unique fingerprint.

Cite this