Information estimators for weighted observations

Hideitsu Hino, Noboru Murata

    Research output: Contribution to journalArticle

    8 Citations (Scopus)

    Abstract

    The Shannon information content is a valuable numerical characteristic of probability distributions. The problem of estimating the information content from an observed dataset is very important in the fields of statistics, information theory, and machine learning. The contribution of the present paper is in proposing information estimators, and showing some of their applications. When the given data are associated with weights, each datum contributes differently to the empirical average of statistics. The proposed estimators can deal with this kind of weighted data. Similar to other conventional methods, the proposed information estimator contains a parameter to be tuned, and is computationally expensive. To overcome these problems, the proposed estimator is further modified so that it is more computationally efficient and has no tuning parameter. The proposed methods are also extended so as to estimate the cross-entropy, entropy, and Kullback-Leibler divergence. Simple numerical experiments show that the information estimators work properly. Then, the estimators are applied to two specific problems, distribution-preserving data compression, and weight optimization for ensemble regression.

    Original languageEnglish
    Pages (from-to)260-275
    Number of pages16
    JournalNeural Networks
    Volume46
    DOIs
    Publication statusPublished - 2013 Oct

    Fingerprint

    Entropy
    Statistics
    Data Compression
    Information Theory
    Weights and Measures
    Information theory
    Data compression
    Probability distributions
    Learning systems
    Tuning
    Experiments
    Datasets
    Machine Learning

    Keywords

    • Entropy estimation
    • Information estimation
    • Weighted data

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Cognitive Neuroscience

    Cite this

    Information estimators for weighted observations. / Hino, Hideitsu; Murata, Noboru.

    In: Neural Networks, Vol. 46, 10.2013, p. 260-275.

    Research output: Contribution to journalArticle

    Hino, Hideitsu ; Murata, Noboru. / Information estimators for weighted observations. In: Neural Networks. 2013 ; Vol. 46. pp. 260-275.
    @article{01752424d75e42e6badddaf16c420136,
    title = "Information estimators for weighted observations",
    abstract = "The Shannon information content is a valuable numerical characteristic of probability distributions. The problem of estimating the information content from an observed dataset is very important in the fields of statistics, information theory, and machine learning. The contribution of the present paper is in proposing information estimators, and showing some of their applications. When the given data are associated with weights, each datum contributes differently to the empirical average of statistics. The proposed estimators can deal with this kind of weighted data. Similar to other conventional methods, the proposed information estimator contains a parameter to be tuned, and is computationally expensive. To overcome these problems, the proposed estimator is further modified so that it is more computationally efficient and has no tuning parameter. The proposed methods are also extended so as to estimate the cross-entropy, entropy, and Kullback-Leibler divergence. Simple numerical experiments show that the information estimators work properly. Then, the estimators are applied to two specific problems, distribution-preserving data compression, and weight optimization for ensemble regression.",
    keywords = "Entropy estimation, Information estimation, Weighted data",
    author = "Hideitsu Hino and Noboru Murata",
    year = "2013",
    month = "10",
    doi = "10.1016/j.neunet.2013.06.005",
    language = "English",
    volume = "46",
    pages = "260--275",
    journal = "Neural Networks",
    issn = "0893-6080",
    publisher = "Elsevier Limited",

    }

    TY - JOUR

    T1 - Information estimators for weighted observations

    AU - Hino, Hideitsu

    AU - Murata, Noboru

    PY - 2013/10

    Y1 - 2013/10

    N2 - The Shannon information content is a valuable numerical characteristic of probability distributions. The problem of estimating the information content from an observed dataset is very important in the fields of statistics, information theory, and machine learning. The contribution of the present paper is in proposing information estimators, and showing some of their applications. When the given data are associated with weights, each datum contributes differently to the empirical average of statistics. The proposed estimators can deal with this kind of weighted data. Similar to other conventional methods, the proposed information estimator contains a parameter to be tuned, and is computationally expensive. To overcome these problems, the proposed estimator is further modified so that it is more computationally efficient and has no tuning parameter. The proposed methods are also extended so as to estimate the cross-entropy, entropy, and Kullback-Leibler divergence. Simple numerical experiments show that the information estimators work properly. Then, the estimators are applied to two specific problems, distribution-preserving data compression, and weight optimization for ensemble regression.

    AB - The Shannon information content is a valuable numerical characteristic of probability distributions. The problem of estimating the information content from an observed dataset is very important in the fields of statistics, information theory, and machine learning. The contribution of the present paper is in proposing information estimators, and showing some of their applications. When the given data are associated with weights, each datum contributes differently to the empirical average of statistics. The proposed estimators can deal with this kind of weighted data. Similar to other conventional methods, the proposed information estimator contains a parameter to be tuned, and is computationally expensive. To overcome these problems, the proposed estimator is further modified so that it is more computationally efficient and has no tuning parameter. The proposed methods are also extended so as to estimate the cross-entropy, entropy, and Kullback-Leibler divergence. Simple numerical experiments show that the information estimators work properly. Then, the estimators are applied to two specific problems, distribution-preserving data compression, and weight optimization for ensemble regression.

    KW - Entropy estimation

    KW - Information estimation

    KW - Weighted data

    UR - http://www.scopus.com/inward/record.url?scp=84880410580&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84880410580&partnerID=8YFLogxK

    U2 - 10.1016/j.neunet.2013.06.005

    DO - 10.1016/j.neunet.2013.06.005

    M3 - Article

    C2 - 23859828

    AN - SCOPUS:84880410580

    VL - 46

    SP - 260

    EP - 275

    JO - Neural Networks

    JF - Neural Networks

    SN - 0893-6080

    ER -