Comparing features for forming music streams in automatic music transcription

Yohei Sakuraba*, Tetsuro Kitahara, Hiroshi G. Okuno

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

In formating temporal sequences of notes played by the same instrument (referred to as music streams'), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is the most dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume4
Publication statusPublished - 2004
Externally publishedYes
EventProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: 2004 May 172004 May 21

Other

OtherProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing
Country/TerritoryCanada
CityMontreal, Que
Period04/5/1704/5/21

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Comparing features for forming music streams in automatic music transcription'. Together they form a unique fingerprint.

Cite this