Lyric synchronizer

Automatic synchronization system between musical audio signals and lyrics

Hiromasa Fujihara, Masataka Goto, Jun Ogata, Hiroshi G. Okuno

Research output: Contribution to journalArticle

38 Citations (Scopus)

Abstract

This paper describes a system that can automatically synchronize polyphonic musical audio signals with their corresponding lyrics. Although methods for synchronizing monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.

Original languageEnglish
Article number5876296
Pages (from-to)1252-1261
Number of pages10
JournalIEEE Journal on Selected Topics in Signal Processing
Volume5
Issue number6
DOIs
Publication statusPublished - 2011 Oct
Externally publishedYes

Fingerprint

Synchronization
Acoustic waves
Transcription

Keywords

  • Alignment
  • Lyrics
  • Singing voice
  • Viterbi algorithm
  • Vocal

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing

Cite this

Lyric synchronizer : Automatic synchronization system between musical audio signals and lyrics. / Fujihara, Hiromasa; Goto, Masataka; Ogata, Jun; Okuno, Hiroshi G.

In: IEEE Journal on Selected Topics in Signal Processing, Vol. 5, No. 6, 5876296, 10.2011, p. 1252-1261.

Research output: Contribution to journalArticle

@article{8e0d77a6c8a04395a74fadd9f1decd72,
title = "Lyric synchronizer: Automatic synchronization system between musical audio signals and lyrics",
abstract = "This paper describes a system that can automatically synchronize polyphonic musical audio signals with their corresponding lyrics. Although methods for synchronizing monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.",
keywords = "Alignment, Lyrics, Singing voice, Viterbi algorithm, Vocal",
author = "Hiromasa Fujihara and Masataka Goto and Jun Ogata and Okuno, {Hiroshi G.}",
year = "2011",
month = "10",
doi = "10.1109/JSTSP.2011.2159577",
language = "English",
volume = "5",
pages = "1252--1261",
journal = "IEEE Journal on Selected Topics in Signal Processing",
issn = "1932-4553",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - Lyric synchronizer

T2 - Automatic synchronization system between musical audio signals and lyrics

AU - Fujihara, Hiromasa

AU - Goto, Masataka

AU - Ogata, Jun

AU - Okuno, Hiroshi G.

PY - 2011/10

Y1 - 2011/10

N2 - This paper describes a system that can automatically synchronize polyphonic musical audio signals with their corresponding lyrics. Although methods for synchronizing monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.

AB - This paper describes a system that can automatically synchronize polyphonic musical audio signals with their corresponding lyrics. Although methods for synchronizing monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.

KW - Alignment

KW - Lyrics

KW - Singing voice

KW - Viterbi algorithm

KW - Vocal

UR - http://www.scopus.com/inward/record.url?scp=80052978670&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052978670&partnerID=8YFLogxK

U2 - 10.1109/JSTSP.2011.2159577

DO - 10.1109/JSTSP.2011.2159577

M3 - Article

VL - 5

SP - 1252

EP - 1261

JO - IEEE Journal on Selected Topics in Signal Processing

JF - IEEE Journal on Selected Topics in Signal Processing

SN - 1932-4553

IS - 6

M1 - 5876296

ER -