Gibbs sampling based multi-scale mixture model for speaker clustering

Shinji Watanabe, Daichi Mochihashi, Takaaki Hori, Atsushi Nakamura

研究成果: Conference contribution

9 被引用数 (Scopus)

抄録

The aim of this work is to apply a sampling approach to speech modeling, and propose a Gibbs sampling based Multi-scale Mixture Model (M3). The proposed approach focuses on the multi-scale property of speech dynamics, i.e., dynamics in speech can be observed on, for instance, short-time acoustical, linguistic-segmental, and utterance-wise temporal scales. M 3 is an extension of the Gaussian mixture model and is considered a hierarchical mixture model, where mixture components in each time scale will change at intervals of the corresponding time unit. We derive a fully Bayesian treatment of the multi-scale mixture model based on Gibbs sampling. The advantage of the proposed model is that each speaker cluster can be precisely modeled based on the Gaussian mixture model unlike conventional single-Gaussian based speaker clustering (e.g., using the Bayesian Information Criterion (BIC)). In addition, Gibbs sampling offers the potential to avoid a serious local optimum problem. Speaker clustering experiments confirmed these advantages and obtained a significant improvement over the conventional BIC based approaches.

本文言語English
ホスト出版物のタイトル2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
ページ4524-4527
ページ数4
DOI
出版ステータスPublished - 2011 8 18
外部発表はい
イベント36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
継続期間: 2011 5 222011 5 27

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period11/5/2211/5/27

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント 「Gibbs sampling based multi-scale mixture model for speaker clustering」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル