Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization

Taro Masuda, Kazuyoshi Yoshii, Masataka Goto, Shigeo Morishima

研究成果: Paper

4 引用 (Scopus)

抄録

This paper proposes a query-by-audio system that aims to detect temporal locations where a musical phrase given as a query is played in musical pieces. The “phrase” in this paper means a short audio excerpt that is not limited to a main melody (singing part) and is usually played by a single musical instrument. A main problem of this task is that the query is often buried in mixture signals consisting of various instruments. To solve this problem, we propose a method that can appropriately calculate the distance between a query and partial components of a musical piece. More specifically, gamma process nonnegative matrix factorization (GaP-NMF) is used for decomposing the spectrogram of the query into an appropriate number of basis spectra and their activation patterns. Semi-supervised GaP-NMF is then used for estimating activation patterns of the learned basis spectra in the musical piece by presuming the piece to partially consist of those spectra. This enables distance calculation based on activation patterns. The experimental results showed that our method outperformed conventional matching methods.

元の言語English
ページ227-232
ページ数6
出版物ステータスPublished - 2014 1 1
イベント15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China
継続期間: 2014 10 272014 10 31

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
Taiwan, Province of China
Taipei
期間14/10/2714/10/31

Fingerprint

Factorization
Chemical activation
Audio systems
Musical instruments
Music
Polyphonic
Activation

ASJC Scopus subject areas

  • Music
  • Information Systems

これを引用

Masuda, T., Yoshii, K., Goto, M., & Morishima, S. (2014). Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization. 227-232. 論文発表場所 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.

Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization. / Masuda, Taro; Yoshii, Kazuyoshi; Goto, Masataka; Morishima, Shigeo.

2014. 227-232 論文発表場所 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.

研究成果: Paper

Masuda, T, Yoshii, K, Goto, M & Morishima, S 2014, 'Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization' 論文発表場所 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China, 14/10/27 - 14/10/31, pp. 227-232.
Masuda T, Yoshii K, Goto M, Morishima S. Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization. 2014. 論文発表場所 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.
Masuda, Taro ; Yoshii, Kazuyoshi ; Goto, Masataka ; Morishima, Shigeo. / Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization. 論文発表場所 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.6 p.
@conference{b1d81fe8f7654ac583535f4619b3e876,
title = "Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization",
abstract = "This paper proposes a query-by-audio system that aims to detect temporal locations where a musical phrase given as a query is played in musical pieces. The “phrase” in this paper means a short audio excerpt that is not limited to a main melody (singing part) and is usually played by a single musical instrument. A main problem of this task is that the query is often buried in mixture signals consisting of various instruments. To solve this problem, we propose a method that can appropriately calculate the distance between a query and partial components of a musical piece. More specifically, gamma process nonnegative matrix factorization (GaP-NMF) is used for decomposing the spectrogram of the query into an appropriate number of basis spectra and their activation patterns. Semi-supervised GaP-NMF is then used for estimating activation patterns of the learned basis spectra in the musical piece by presuming the piece to partially consist of those spectra. This enables distance calculation based on activation patterns. The experimental results showed that our method outperformed conventional matching methods.",
author = "Taro Masuda and Kazuyoshi Yoshii and Masataka Goto and Shigeo Morishima",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "227--232",
note = "15th International Society for Music Information Retrieval Conference, ISMIR 2014 ; Conference date: 27-10-2014 Through 31-10-2014",

}

TY - CONF

T1 - Spotting a query phrase from polyphonic music audio signals based on semi-supervised nonnegative matrix factorization

AU - Masuda, Taro

AU - Yoshii, Kazuyoshi

AU - Goto, Masataka

AU - Morishima, Shigeo

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper proposes a query-by-audio system that aims to detect temporal locations where a musical phrase given as a query is played in musical pieces. The “phrase” in this paper means a short audio excerpt that is not limited to a main melody (singing part) and is usually played by a single musical instrument. A main problem of this task is that the query is often buried in mixture signals consisting of various instruments. To solve this problem, we propose a method that can appropriately calculate the distance between a query and partial components of a musical piece. More specifically, gamma process nonnegative matrix factorization (GaP-NMF) is used for decomposing the spectrogram of the query into an appropriate number of basis spectra and their activation patterns. Semi-supervised GaP-NMF is then used for estimating activation patterns of the learned basis spectra in the musical piece by presuming the piece to partially consist of those spectra. This enables distance calculation based on activation patterns. The experimental results showed that our method outperformed conventional matching methods.

AB - This paper proposes a query-by-audio system that aims to detect temporal locations where a musical phrase given as a query is played in musical pieces. The “phrase” in this paper means a short audio excerpt that is not limited to a main melody (singing part) and is usually played by a single musical instrument. A main problem of this task is that the query is often buried in mixture signals consisting of various instruments. To solve this problem, we propose a method that can appropriately calculate the distance between a query and partial components of a musical piece. More specifically, gamma process nonnegative matrix factorization (GaP-NMF) is used for decomposing the spectrogram of the query into an appropriate number of basis spectra and their activation patterns. Semi-supervised GaP-NMF is then used for estimating activation patterns of the learned basis spectra in the musical piece by presuming the piece to partially consist of those spectra. This enables distance calculation based on activation patterns. The experimental results showed that our method outperformed conventional matching methods.

UR - http://www.scopus.com/inward/record.url?scp=84973290458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973290458&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84973290458

SP - 227

EP - 232

ER -