Bayesian audio alignment based on a unified generative model of music composition and performance

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

Research output: Contribution to conferencePaper

4 Citations (Scopus)

Abstract

This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.

Original languageEnglish
Pages233-238
Number of pages6
Publication statusPublished - 2014 Jan 1
Event15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China
Duration: 2014 Oct 272014 Oct 31

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
CountryTaiwan, Province of China
CityTaipei
Period14/10/2714/10/31

Fingerprint

Hidden Markov models
Chemical analysis
Acoustics
Music Composition
Music Performance
Alignment
Generative
Statistical Models
Music

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Maezawa, A., Itoyama, K., Yoshii, K., & Okuno, H. G. (2014). Bayesian audio alignment based on a unified generative model of music composition and performance. 233-238. Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.

Bayesian audio alignment based on a unified generative model of music composition and performance. / Maezawa, Akira; Itoyama, Katsutoshi; Yoshii, Kazuyoshi; Okuno, Hiroshi G.

2014. 233-238 Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.

Research output: Contribution to conferencePaper

Maezawa, A, Itoyama, K, Yoshii, K & Okuno, HG 2014, 'Bayesian audio alignment based on a unified generative model of music composition and performance' Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China, 14/10/27 - 14/10/31, pp. 233-238.
Maezawa A, Itoyama K, Yoshii K, Okuno HG. Bayesian audio alignment based on a unified generative model of music composition and performance. 2014. Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.
Maezawa, Akira ; Itoyama, Katsutoshi ; Yoshii, Kazuyoshi ; Okuno, Hiroshi G. / Bayesian audio alignment based on a unified generative model of music composition and performance. Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.6 p.
@conference{cab157712fd749849e97b5591273fc3c,
title = "Bayesian audio alignment based on a unified generative model of music composition and performance",
abstract = "This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.",
author = "Akira Maezawa and Katsutoshi Itoyama and Kazuyoshi Yoshii and Okuno, {Hiroshi G.}",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "233--238",
note = "15th International Society for Music Information Retrieval Conference, ISMIR 2014 ; Conference date: 27-10-2014 Through 31-10-2014",

}

TY - CONF

T1 - Bayesian audio alignment based on a unified generative model of music composition and performance

AU - Maezawa, Akira

AU - Itoyama, Katsutoshi

AU - Yoshii, Kazuyoshi

AU - Okuno, Hiroshi G.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.

AB - This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.

UR - http://www.scopus.com/inward/record.url?scp=84977918138&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84977918138&partnerID=8YFLogxK

M3 - Paper

SP - 233

EP - 238

ER -