Bayesian modelling of the speech spectrum using mixture of Gaussians

Parham Zolfaghari, Shinji Watanabe, Atsushi Nakamura, Shigeru Katagiri

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MOG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MOG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MOG and VB-MOG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MOG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.

Original languageEnglish
JournalUnknown Journal
Volume1
Publication statusPublished - 2004
Externally publishedYes

Fingerprint

histograms
envelopes
Maximum likelihood
Model structures
Parameterization
parameterization
derivation
excitation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Bayesian modelling of the speech spectrum using mixture of Gaussians. / Zolfaghari, Parham; Watanabe, Shinji; Nakamura, Atsushi; Katagiri, Shigeru.

In: Unknown Journal, Vol. 1, 2004.

Research output: Contribution to journalArticle

Zolfaghari, Parham ; Watanabe, Shinji ; Nakamura, Atsushi ; Katagiri, Shigeru. / Bayesian modelling of the speech spectrum using mixture of Gaussians. In: Unknown Journal. 2004 ; Vol. 1.
@article{5d3c1a9ba7674ff59c202a9b0d6598f7,
title = "Bayesian modelling of the speech spectrum using mixture of Gaussians",
abstract = "This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MOG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MOG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MOG and VB-MOG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MOG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.",
author = "Parham Zolfaghari and Shinji Watanabe and Atsushi Nakamura and Shigeru Katagiri",
year = "2004",
language = "English",
volume = "1",
journal = "Nuclear Physics A",
issn = "0375-9474",
publisher = "Elsevier",

}

TY - JOUR

T1 - Bayesian modelling of the speech spectrum using mixture of Gaussians

AU - Zolfaghari, Parham

AU - Watanabe, Shinji

AU - Nakamura, Atsushi

AU - Katagiri, Shigeru

PY - 2004

Y1 - 2004

N2 - This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MOG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MOG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MOG and VB-MOG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MOG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.

AB - This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MOG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MOG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MOG and VB-MOG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MOG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.

UR - http://www.scopus.com/inward/record.url?scp=4544260276&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544260276&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:4544260276

VL - 1

JO - Nuclear Physics A

JF - Nuclear Physics A

SN - 0375-9474

ER -