Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.

Original languageEnglish
Pages (from-to)1998-2002
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 2017 Aug 202017 Aug 24

Keywords

  • Mel-generalized cepstral representation
  • Non-negative matrix factorization
  • Single channel signal processing
  • Speech enhancement

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization'. Together they form a unique fingerprint.

Cite this