Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model

Akira Maezawa, Hiroshi G. Okuno, Tetsuya Ogata, Masataka Goto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages185-188
Number of pages4
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague
Duration: 2011 May 222011 May 27

Other

Other36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CityPrague
Period11/5/2211/5/27

Fingerprint

Hidden Markov models
Reverberation
Maximum likelihood

Keywords

  • Audio-to-score alignment
  • Variational Bayes inference

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Maezawa, A., Okuno, H. G., Ogata, T., & Goto, M. (2011). Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 185-188). [5946371] https://doi.org/10.1109/ICASSP.2011.5946371

Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model. / Maezawa, Akira; Okuno, Hiroshi G.; Ogata, Tetsuya; Goto, Masataka.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2011. p. 185-188 5946371.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Maezawa, A, Okuno, HG, Ogata, T & Goto, M 2011, Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 5946371, pp. 185-188, 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, 11/5/22. https://doi.org/10.1109/ICASSP.2011.5946371
Maezawa A, Okuno HG, Ogata T, Goto M. Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2011. p. 185-188. 5946371 https://doi.org/10.1109/ICASSP.2011.5946371
Maezawa, Akira ; Okuno, Hiroshi G. ; Ogata, Tetsuya ; Goto, Masataka. / Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2011. pp. 185-188
@inproceedings{bae4cef8f0c64035afa3f97394fd81ce,
title = "Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model",
abstract = "This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.",
keywords = "Audio-to-score alignment, Variational Bayes inference",
author = "Akira Maezawa and Okuno, {Hiroshi G.} and Tetsuya Ogata and Masataka Goto",
year = "2011",
doi = "10.1109/ICASSP.2011.5946371",
language = "English",
isbn = "9781457705397",
pages = "185--188",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model

AU - Maezawa, Akira

AU - Okuno, Hiroshi G.

AU - Ogata, Tetsuya

AU - Goto, Masataka

PY - 2011

Y1 - 2011

N2 - This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.

AB - This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.

KW - Audio-to-score alignment

KW - Variational Bayes inference

UR - http://www.scopus.com/inward/record.url?scp=80051603530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051603530&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2011.5946371

DO - 10.1109/ICASSP.2011.5946371

M3 - Conference contribution

AN - SCOPUS:80051603530

SN - 9781457705397

SP - 185

EP - 188

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -