This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.
|出版ステータス||Published - 2014 1月 1|
|イベント||15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China|
継続期間: 2014 10月 27 → 2014 10月 31
|Conference||15th International Society for Music Information Retrieval Conference, ISMIR 2014|
|国/地域||Taiwan, Province of China|
|Period||14/10/27 → 14/10/31|
ASJC Scopus subject areas