Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We have proposed a novel speaker clustering method based on a hierarchically structured utterance-oriented Dirichlet process mixture model. In the proposed method, the number of speakers can be determined from the given data using a nonparametric Bayesian manner and intra-speaker variability is successfully handled by multi-scale mixture modeling. Experimental result showed that the proposed method is computationally-efficientand effective in speaker clustering. The proposed method significantly improve the accuracy of speaker clustering systems as compared with the conventional method, particularly for the case in which the number of utterances varied from speaker to speaker.

Original languageEnglish
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages2163-2166
Number of pages4
Publication statusPublished - 2012 Dec 1
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: 2012 Sep 92012 Sep 13

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume3

Conference

Conference13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
CountryUnited States
CityPortland, OR
Period12/9/912/9/13

Keywords

  • Gibbs sampling
  • Nonparametric Bayesian model
  • Speaker clustering
  • Utterance-oriented Dirichlet process mixture model

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Cite this

Tawara, N., Ogawa, T., Watanabe, S., Nakamura, A., & Kobayashi, T. (2012). Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 (pp. 2163-2166). (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; Vol. 3).