Automatic singing voice to music video generation via mashup of singing video clips

Tatsunori Hirai, Yukara Ikemiya, Kazuyoshi Yoshii, Tomoyasu Nakano, Masataka Goto, Shigeo Morishima

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.

Original languageEnglish
Title of host publicationProceedings of the 12th International Conference in Sound and Music Computing, SMC 2015
PublisherMusic Technology Research Group, Department of Computer Science, Maynooth University
Pages153-159
Number of pages7
ISBN (Electronic)9780992746629
Publication statusPublished - 2015
Externally publishedYes
Event12th International Conference on Sound and Music Computing, SMC 2015 - Maynooth, Ireland
Duration: 2015 Jul 302015 Aug 1

Other

Other12th International Conference on Sound and Music Computing, SMC 2015
CountryIreland
CityMaynooth
Period15/7/3015/8/1

ASJC Scopus subject areas

  • Music
  • Computer Science Applications
  • Media Technology

Fingerprint Dive into the research topics of 'Automatic singing voice to music video generation via mashup of singing video clips'. Together they form a unique fingerprint.

  • Cite this

    Hirai, T., Ikemiya, Y., Yoshii, K., Nakano, T., Goto, M., & Morishima, S. (2015). Automatic singing voice to music video generation via mashup of singing video clips. In Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015 (pp. 153-159). Music Technology Research Group, Department of Computer Science, Maynooth University.