Automatic singing voice to music video generation via mashup of singing video clips

Tatsunori Hirai, Yukara Ikemiya, Kazuyoshi Yoshii, Tomoyasu Nakano, Masataka Goto, Shigeo Morishima

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.

Original languageEnglish
Title of host publicationProceedings of the 12th International Conference in Sound and Music Computing, SMC 2015
PublisherMusic Technology Research Group, Department of Computer Science, Maynooth University
Pages153-159
Number of pages7
ISBN (Electronic)9780992746629
Publication statusPublished - 2015
Externally publishedYes
Event12th International Conference on Sound and Music Computing, SMC 2015 - Maynooth, Ireland
Duration: 2015 Jul 302015 Aug 1

Other

Other12th International Conference on Sound and Music Computing, SMC 2015
CountryIreland
CityMaynooth
Period15/7/3015/8/1

Fingerprint

Video streaming
Dynamic programming
Music Videos
Song
Singers

ASJC Scopus subject areas

  • Music
  • Computer Science Applications
  • Media Technology

Cite this

Hirai, T., Ikemiya, Y., Yoshii, K., Nakano, T., Goto, M., & Morishima, S. (2015). Automatic singing voice to music video generation via mashup of singing video clips. In Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015 (pp. 153-159). Music Technology Research Group, Department of Computer Science, Maynooth University.

Automatic singing voice to music video generation via mashup of singing video clips. / Hirai, Tatsunori; Ikemiya, Yukara; Yoshii, Kazuyoshi; Nakano, Tomoyasu; Goto, Masataka; Morishima, Shigeo.

Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015. Music Technology Research Group, Department of Computer Science, Maynooth University, 2015. p. 153-159.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hirai, T, Ikemiya, Y, Yoshii, K, Nakano, T, Goto, M & Morishima, S 2015, Automatic singing voice to music video generation via mashup of singing video clips. in Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015. Music Technology Research Group, Department of Computer Science, Maynooth University, pp. 153-159, 12th International Conference on Sound and Music Computing, SMC 2015, Maynooth, Ireland, 15/7/30.
Hirai T, Ikemiya Y, Yoshii K, Nakano T, Goto M, Morishima S. Automatic singing voice to music video generation via mashup of singing video clips. In Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015. Music Technology Research Group, Department of Computer Science, Maynooth University. 2015. p. 153-159
Hirai, Tatsunori ; Ikemiya, Yukara ; Yoshii, Kazuyoshi ; Nakano, Tomoyasu ; Goto, Masataka ; Morishima, Shigeo. / Automatic singing voice to music video generation via mashup of singing video clips. Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015. Music Technology Research Group, Department of Computer Science, Maynooth University, 2015. pp. 153-159
@inproceedings{3bfacb73b9dd45adb4d4b46c0958bae6,
title = "Automatic singing voice to music video generation via mashup of singing video clips",
abstract = "This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.",
author = "Tatsunori Hirai and Yukara Ikemiya and Kazuyoshi Yoshii and Tomoyasu Nakano and Masataka Goto and Shigeo Morishima",
year = "2015",
language = "English",
pages = "153--159",
booktitle = "Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015",
publisher = "Music Technology Research Group, Department of Computer Science, Maynooth University",

}

TY - GEN

T1 - Automatic singing voice to music video generation via mashup of singing video clips

AU - Hirai, Tatsunori

AU - Ikemiya, Yukara

AU - Yoshii, Kazuyoshi

AU - Nakano, Tomoyasu

AU - Goto, Masataka

AU - Morishima, Shigeo

PY - 2015

Y1 - 2015

N2 - This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.

AB - This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.

UR - http://www.scopus.com/inward/record.url?scp=84988484675&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988484675&partnerID=8YFLogxK

M3 - Conference contribution

SP - 153

EP - 159

BT - Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015

PB - Music Technology Research Group, Department of Computer Science, Maynooth University

ER -