Comparing features for forming music streams in automatic music transcription

Yohei Sakuraba, Tetsuro Kitahara, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

In formating temporal sequences of notes played by the same instrument (referred to as music streams'), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is the most dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume4
Publication statusPublished - 2004
Externally publishedYes
EventProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: 2004 May 172004 May 21

Other

OtherProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing
CountryCanada
CityMontreal, Que
Period04/5/1704/5/21

Fingerprint

Musical instruments
music
Timber
Transcription
proximity

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Sakuraba, Y., Kitahara, T., & Okuno, H. G. (2004). Comparing features for forming music streams in automatic music transcription. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 4)

Comparing features for forming music streams in automatic music transcription. / Sakuraba, Yohei; Kitahara, Tetsuro; Okuno, Hiroshi G.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 4 2004.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sakuraba, Y, Kitahara, T & Okuno, HG 2004, Comparing features for forming music streams in automatic music transcription. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 4, Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Que, Canada, 04/5/17.
Sakuraba Y, Kitahara T, Okuno HG. Comparing features for forming music streams in automatic music transcription. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 4. 2004
Sakuraba, Yohei ; Kitahara, Tetsuro ; Okuno, Hiroshi G. / Comparing features for forming music streams in automatic music transcription. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 4 2004.
@inproceedings{e35c9809a00f4cfe8cda1921485c1e3f,
title = "Comparing features for forming music streams in automatic music transcription",
abstract = "In formating temporal sequences of notes played by the same instrument (referred to as music streams'), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is the most dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3{\%} by only timbre similarity to 84.9{\%} by integrating four features.",
author = "Yohei Sakuraba and Tetsuro Kitahara and Okuno, {Hiroshi G.}",
year = "2004",
language = "English",
volume = "4",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Comparing features for forming music streams in automatic music transcription

AU - Sakuraba, Yohei

AU - Kitahara, Tetsuro

AU - Okuno, Hiroshi G.

PY - 2004

Y1 - 2004

N2 - In formating temporal sequences of notes played by the same instrument (referred to as music streams'), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is the most dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.

AB - In formating temporal sequences of notes played by the same instrument (referred to as music streams'), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is the most dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.

UR - http://www.scopus.com/inward/record.url?scp=4544229825&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544229825&partnerID=8YFLogxK

M3 - Conference contribution

VL - 4

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -