Bayesian audio alignment based on a unified generative model of music composition and performance

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

Research output: Contribution to conferencePaperpeer-review

5 Citations (Scopus)

Abstract

This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.

Original languageEnglish
Pages233-238
Number of pages6
Publication statusPublished - 2014 Jan 1
Event15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China
Duration: 2014 Oct 272014 Oct 31

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
Country/TerritoryTaiwan, Province of China
CityTaipei
Period14/10/2714/10/31

ASJC Scopus subject areas

  • Music
  • Information Systems

Fingerprint

Dive into the research topics of 'Bayesian audio alignment based on a unified generative model of music composition and performance'. Together they form a unique fingerprint.

Cite this