A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    Research output: Contribution to journalArticle

    Abstract

    An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.

    Original languageEnglish
    JournalAPSIPA Transactions on Signal and Information Processing
    Volume4
    DOIs
    Publication statusPublished - 2015 Oct 28

    Fingerprint

    Sampling
    Markov processes

    Keywords

    • Gibbs sampling
    • Non-parametric Bayesian model
    • Sampling approach
    • Speaker clustering
    • Utterance-oriented Dirichlet process mixture model

    ASJC Scopus subject areas

    • Information Systems
    • Signal Processing

    Cite this

    @article{5894ddad34504b858dd96041b231e204,
    title = "A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data",
    abstract = "An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.",
    keywords = "Gibbs sampling, Non-parametric Bayesian model, Sampling approach, Speaker clustering, Utterance-oriented Dirichlet process mixture model",
    author = "Naohiro Tawara and Tetsuji Ogawa and Shinji Watanabe and Atsushi Nakamura and Tetsunori Kobayashi",
    year = "2015",
    month = "10",
    day = "28",
    doi = "10.1017/ATSIP.2015.19",
    language = "English",
    volume = "4",
    journal = "APSIPA Transactions on Signal and Information Processing",
    issn = "2048-7703",
    publisher = "Cambridge University Press",

    }

    TY - JOUR

    T1 - A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data

    AU - Tawara, Naohiro

    AU - Ogawa, Tetsuji

    AU - Watanabe, Shinji

    AU - Nakamura, Atsushi

    AU - Kobayashi, Tetsunori

    PY - 2015/10/28

    Y1 - 2015/10/28

    N2 - An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.

    AB - An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.

    KW - Gibbs sampling

    KW - Non-parametric Bayesian model

    KW - Sampling approach

    KW - Speaker clustering

    KW - Utterance-oriented Dirichlet process mixture model

    UR - http://www.scopus.com/inward/record.url?scp=84949294383&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84949294383&partnerID=8YFLogxK

    U2 - 10.1017/ATSIP.2015.19

    DO - 10.1017/ATSIP.2015.19

    M3 - Article

    VL - 4

    JO - APSIPA Transactions on Signal and Information Processing

    JF - APSIPA Transactions on Signal and Information Processing

    SN - 2048-7703

    ER -