Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods.

    Original languageEnglish
    Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing, MLSP
    DOIs
    Publication statusPublished - 2013
    Event2013 16th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2013 - Southampton
    Duration: 2013 Sep 222013 Sep 25

    Other

    Other2013 16th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2013
    CitySouthampton
    Period13/9/2213/9/25

    Fingerprint

    Sampling
    Experiments

    Keywords

    • blocked Gibbs sampling
    • Fully Bayesian approach
    • iterative conditional modes
    • multi-scale mixture model
    • speaker clustering

    ASJC Scopus subject areas

    • Human-Computer Interaction
    • Signal Processing

    Cite this

    Tawara, N., Ogawa, T., Watanabe, S., Nakamura, A., & Kobayashi, T. (2013). Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data. In IEEE International Workshop on Machine Learning for Signal Processing, MLSP [6661902] https://doi.org/10.1109/MLSP.2013.6661902

    Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data. / Tawara, Naohiro; Ogawa, Tetsuji; Watanabe, Shinji; Nakamura, Atsushi; Kobayashi, Tetsunori.

    IEEE International Workshop on Machine Learning for Signal Processing, MLSP. 2013. 6661902.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Tawara, N, Ogawa, T, Watanabe, S, Nakamura, A & Kobayashi, T 2013, Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data. in IEEE International Workshop on Machine Learning for Signal Processing, MLSP., 6661902, 2013 16th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2013, Southampton, 13/9/22. https://doi.org/10.1109/MLSP.2013.6661902
    Tawara N, Ogawa T, Watanabe S, Nakamura A, Kobayashi T. Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data. In IEEE International Workshop on Machine Learning for Signal Processing, MLSP. 2013. 6661902 https://doi.org/10.1109/MLSP.2013.6661902
    Tawara, Naohiro ; Ogawa, Tetsuji ; Watanabe, Shinji ; Nakamura, Atsushi ; Kobayashi, Tetsunori. / Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data. IEEE International Workshop on Machine Learning for Signal Processing, MLSP. 2013.
    @inproceedings{2d7fd043155b4c2dac457df01076e75d,
    title = "Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data",
    abstract = "A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17{\%} on average compared to the conventional sampling-based methods.",
    keywords = "blocked Gibbs sampling, Fully Bayesian approach, iterative conditional modes, multi-scale mixture model, speaker clustering",
    author = "Naohiro Tawara and Tetsuji Ogawa and Shinji Watanabe and Atsushi Nakamura and Tetsunori Kobayashi",
    year = "2013",
    doi = "10.1109/MLSP.2013.6661902",
    language = "English",
    isbn = "9781479911806",
    booktitle = "IEEE International Workshop on Machine Learning for Signal Processing, MLSP",

    }

    TY - GEN

    T1 - Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data

    AU - Tawara, Naohiro

    AU - Ogawa, Tetsuji

    AU - Watanabe, Shinji

    AU - Nakamura, Atsushi

    AU - Kobayashi, Tetsunori

    PY - 2013

    Y1 - 2013

    N2 - A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods.

    AB - A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods.

    KW - blocked Gibbs sampling

    KW - Fully Bayesian approach

    KW - iterative conditional modes

    KW - multi-scale mixture model

    KW - speaker clustering

    UR - http://www.scopus.com/inward/record.url?scp=84893299796&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84893299796&partnerID=8YFLogxK

    U2 - 10.1109/MLSP.2013.6661902

    DO - 10.1109/MLSP.2013.6661902

    M3 - Conference contribution

    AN - SCOPUS:84893299796

    SN - 9781479911806

    BT - IEEE International Workshop on Machine Learning for Signal Processing, MLSP

    ER -