Unsupervised Disentanglement of Timbral, Pitch, and Variation Features From Musical Instrument Sounds With Random Perturbation

Keitaro Tanaka*, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes an unsupervised disentangled representation learning method for musical instrument sounds with pitched and unpitched spectra. Since conventional methods have commonly attempted to disentangle timbral features (e.g., instruments) and pitches (e.g., MIDI note numbers and FOs), they can be applied to only pitched sounds. Global timbres unique to instruments and local variations (e.g., expressions and playstyles) are also treated without distinction. Instead, we represent the spectrogram of a musical instrument sound with a variational autoencoder (VAE) that has timbral, pitch, and variation features as latent variables. The pitch clarity or percussiveness, brightness, and FOs (if existing) are considered to be represented in the abstract pitch features. The unsupervised disentanglement is achieved by extracting time-invariant and time-varying features as global timbres and local variations from randomly pitch-shifted input sounds and time-varying features as local pitch features from randomly timbre-distorted input sounds. To enhance the disentanglement of timbral and variation features from pitch features, input sounds are separated into spectral envelopes and fine structures with cepstrum analysis. The experiments showed that the proposed method can provide effective timbral and pitch features for better musical instrument classification and pitch estimation.

Original languageEnglish
Title of host publicationProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages709-716
Number of pages8
ISBN (Electronic)9786165904773
DOIs
Publication statusPublished - 2022
Event2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, Thailand
Duration: 2022 Nov 72022 Nov 10

Publication series

NameProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

Conference

Conference2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Country/TerritoryThailand
CityChiang Mai
Period22/11/722/11/10

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Unsupervised Disentanglement of Timbral, Pitch, and Variation Features From Musical Instrument Sounds With Random Perturbation'. Together they form a unique fingerprint.

Cite this