Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals

Hiromasa Fujihara, Masataka Goto, Jun Ogata, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Citations (Scopus)

Abstract

This paper describes a system that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonie speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures, a method for detecting vocal sections, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs.

Original languageEnglish
Title of host publicationISM 2006 - 8th IEEE International Symposium on Multimedia
Pages257-264
Number of pages8
DOIs
Publication statusPublished - 2006
Externally publishedYes
EventISM 2006 - 8th IEEE International Symposium on Multimedia - San Diego, CA
Duration: 2006 Dec 112006 Dec 13

Other

OtherISM 2006 - 8th IEEE International Symposium on Multimedia
CitySan Diego, CA
Period06/12/1106/12/13

Fingerprint

Synchronization
Acoustic waves
Transcription

ASJC Scopus subject areas

  • Computer Networks and Communications

Cite this

Fujihara, H., Goto, M., Ogata, J., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. In ISM 2006 - 8th IEEE International Symposium on Multimedia (pp. 257-264). [4061176] https://doi.org/10.1109/ISM.2006.38

Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. / Fujihara, Hiromasa; Goto, Masataka; Ogata, Jun; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

ISM 2006 - 8th IEEE International Symposium on Multimedia. 2006. p. 257-264 4061176.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fujihara, H, Goto, M, Ogata, J, Komatani, K, Ogata, T & Okuno, HG 2006, Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. in ISM 2006 - 8th IEEE International Symposium on Multimedia., 4061176, pp. 257-264, ISM 2006 - 8th IEEE International Symposium on Multimedia, San Diego, CA, 06/12/11. https://doi.org/10.1109/ISM.2006.38
Fujihara H, Goto M, Ogata J, Komatani K, Ogata T, Okuno HG. Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. In ISM 2006 - 8th IEEE International Symposium on Multimedia. 2006. p. 257-264. 4061176 https://doi.org/10.1109/ISM.2006.38
Fujihara, Hiromasa ; Goto, Masataka ; Ogata, Jun ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. ISM 2006 - 8th IEEE International Symposium on Multimedia. 2006. pp. 257-264
@inproceedings{9de48e82a53c4e388baee490b9d8d654,
title = "Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals",
abstract = "This paper describes a system that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonie speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures, a method for detecting vocal sections, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs.",
author = "Hiromasa Fujihara and Masataka Goto and Jun Ogata and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
doi = "10.1109/ISM.2006.38",
language = "English",
isbn = "0769527469",
pages = "257--264",
booktitle = "ISM 2006 - 8th IEEE International Symposium on Multimedia",

}

TY - GEN

T1 - Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals

AU - Fujihara, Hiromasa

AU - Goto, Masataka

AU - Ogata, Jun

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - This paper describes a system that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonie speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures, a method for detecting vocal sections, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs.

AB - This paper describes a system that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonie speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures, a method for detecting vocal sections, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs.

UR - http://www.scopus.com/inward/record.url?scp=34547508425&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547508425&partnerID=8YFLogxK

U2 - 10.1109/ISM.2006.38

DO - 10.1109/ISM.2006.38

M3 - Conference contribution

SN - 0769527469

SN - 9780769527468

SP - 257

EP - 264

BT - ISM 2006 - 8th IEEE International Symposium on Multimedia

ER -