A nonparametric clustering algorithm with a quantile-based likelihood estimator

Hideitsu Hino, Noboru Murata

    Research output: Contribution to journalArticle

    3 Citations (Scopus)

    Abstract

    Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

    Original languageEnglish
    Pages (from-to)2074-2101
    Number of pages28
    JournalNeural Computation
    Volume26
    Issue number9
    DOIs
    Publication statusPublished - 2014 Sep 13

    Fingerprint

    Cluster Analysis
    Entropy
    Weights and Measures
    Learning
    Clustering Methods
    Sampling
    Datasets

    ASJC Scopus subject areas

    • Cognitive Neuroscience
    • Arts and Humanities (miscellaneous)

    Cite this

    A nonparametric clustering algorithm with a quantile-based likelihood estimator. / Hino, Hideitsu; Murata, Noboru.

    In: Neural Computation, Vol. 26, No. 9, 13.09.2014, p. 2074-2101.

    Research output: Contribution to journalArticle

    @article{76b1001691344fdfa26ec246bcf9237d,
    title = "A nonparametric clustering algorithm with a quantile-based likelihood estimator",
    abstract = "Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.",
    author = "Hideitsu Hino and Noboru Murata",
    year = "2014",
    month = "9",
    day = "13",
    doi = "10.1162/NECO_a_00628",
    language = "English",
    volume = "26",
    pages = "2074--2101",
    journal = "Neural Computation",
    issn = "0899-7667",
    publisher = "MIT Press Journals",
    number = "9",

    }

    TY - JOUR

    T1 - A nonparametric clustering algorithm with a quantile-based likelihood estimator

    AU - Hino, Hideitsu

    AU - Murata, Noboru

    PY - 2014/9/13

    Y1 - 2014/9/13

    N2 - Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

    AB - Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

    UR - http://www.scopus.com/inward/record.url?scp=84921766943&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84921766943&partnerID=8YFLogxK

    U2 - 10.1162/NECO_a_00628

    DO - 10.1162/NECO_a_00628

    M3 - Article

    AN - SCOPUS:84921766943

    VL - 26

    SP - 2074

    EP - 2101

    JO - Neural Computation

    JF - Neural Computation

    SN - 0899-7667

    IS - 9

    ER -