In search of a perceptual metric for timbre: Dissimilarity judgments among synthetic sounds with MFCC-derived spectral envelopes

Hiroko Terasawa, Jonathan Berger, Shoji Makino

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, that is, the perception specifically related to the spectral element of timbre. Mel-cepstrum (Mel-frequency cepstral coefficients or MFCCs) is chosen as a hypothetical metric for spectral envelope perception due to its desirable properties of linearity, orthogonality, and multidimensionality. The experimental results confirmed the relevance of Mel-cepstrum to the perceived timbre dissimilarity when the spectral envelopes of complex-tone synthetic sounds were systematically controlled. The first experiment measured the perceived dissimilarity when the stimuli were synthesized by varying only a single coefficient from MFCC. Linear regression analysis proved that each of the 12 MFCCs has a linear correlation with spectral envelope perception. The second experiment measured the perceived dissimilarity when the stimuli were synthesized by varying two of the MFCCs. Multiple regression analysis showed that the perceived dissimilarity can be explained in terms of the Euclidean distance of the MFCC values of the synthetic sounds. The quantitative and perceptual relevance between the MFCCs and spectral centroids is also discussed. These results suggest that MFCCs can be a metric representation of spectral envelope perception, where each of its orthogonal basis functions provides a linear match with human perception.

Original languageEnglish
Pages (from-to)674-685
Number of pages12
JournalAES: Journal of the Audio Engineering Society
Volume60
Issue number9
Publication statusPublished - 2012 Sep
Externally publishedYes

ASJC Scopus subject areas

  • Engineering(all)
  • Music

Fingerprint Dive into the research topics of 'In search of a perceptual metric for timbre: Dissimilarity judgments among synthetic sounds with MFCC-derived spectral envelopes'. Together they form a unique fingerprint.

Cite this