Bayesian audio-to-score alignment based on joint inference of Timbre, Volume, Tempo, and note onset timings

Akira Maezawa, Hiroshi G. Okuno

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument's part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.

Original languageEnglish
Pages (from-to)74-87
Number of pages14
JournalComputer Music Journal
Volume39
Issue number1
DOIs
Publication statusPublished - 2015 Mar 27
Externally publishedYes

Fingerprint

Dynamical systems
Musical instruments
Reverberation
Inference
Onset
Alignment
Timbre
Markov Model
Acoustic waves
Harmonics
Dynamic Systems
Musical Score
Musical Instruments
Fluctuations
Bayesian Inference
Bayesian Model
Sound

ASJC Scopus subject areas

  • Computer Science Applications
  • Media Technology
  • Music

Cite this

Bayesian audio-to-score alignment based on joint inference of Timbre, Volume, Tempo, and note onset timings. / Maezawa, Akira; Okuno, Hiroshi G.

In: Computer Music Journal, Vol. 39, No. 1, 27.03.2015, p. 74-87.

Research output: Contribution to journalArticle

@article{6a57f45789514b4bbeb4901ba01da18b,
title = "Bayesian audio-to-score alignment based on joint inference of Timbre, Volume, Tempo, and note onset timings",
abstract = "This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument's part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.",
author = "Akira Maezawa and Okuno, {Hiroshi G.}",
year = "2015",
month = "3",
day = "27",
doi = "10.1162/COMJ-a-00286",
language = "English",
volume = "39",
pages = "74--87",
journal = "Computer Music Journal",
issn = "0148-9267",
publisher = "MIT Press Journals",
number = "1",

}

TY - JOUR

T1 - Bayesian audio-to-score alignment based on joint inference of Timbre, Volume, Tempo, and note onset timings

AU - Maezawa, Akira

AU - Okuno, Hiroshi G.

PY - 2015/3/27

Y1 - 2015/3/27

N2 - This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument's part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.

AB - This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument's part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.

UR - http://www.scopus.com/inward/record.url?scp=84925601978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925601978&partnerID=8YFLogxK

U2 - 10.1162/COMJ-a-00286

DO - 10.1162/COMJ-a-00286

M3 - Article

AN - SCOPUS:84925601978

VL - 39

SP - 74

EP - 87

JO - Computer Music Journal

JF - Computer Music Journal

SN - 0148-9267

IS - 1

ER -