Change-Point Detection in a Sequence of Bags-of-Data

Kensuke Koshijima, Hideitsu Hino, Noboru Murata

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

    Original languageEnglish
    Article number7095580
    Pages (from-to)2632-2644
    Number of pages13
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume27
    Issue number10
    DOIs
    Publication statusPublished - 2015 Oct 1

    Fingerprint

    Random variables

    Keywords

    • anomaly detection
    • Change-point detection
    • Earth Movers Distance
    • entropy estimator

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Information Systems
    • Computer Science Applications

    Cite this

    Change-Point Detection in a Sequence of Bags-of-Data. / Koshijima, Kensuke; Hino, Hideitsu; Murata, Noboru.

    In: IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No. 10, 7095580, 01.10.2015, p. 2632-2644.

    Research output: Contribution to journalArticle

    Koshijima, Kensuke ; Hino, Hideitsu ; Murata, Noboru. / Change-Point Detection in a Sequence of Bags-of-Data. In: IEEE Transactions on Knowledge and Data Engineering. 2015 ; Vol. 27, No. 10. pp. 2632-2644.
    @article{00416249b413411ba42af27be491566d,
    title = "Change-Point Detection in a Sequence of Bags-of-Data",
    abstract = "In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.",
    keywords = "anomaly detection, Change-point detection, Earth Movers Distance, entropy estimator",
    author = "Kensuke Koshijima and Hideitsu Hino and Noboru Murata",
    year = "2015",
    month = "10",
    day = "1",
    doi = "10.1109/TKDE.2015.2426693",
    language = "English",
    volume = "27",
    pages = "2632--2644",
    journal = "IEEE Transactions on Knowledge and Data Engineering",
    issn = "1041-4347",
    publisher = "IEEE Computer Society",
    number = "10",

    }

    TY - JOUR

    T1 - Change-Point Detection in a Sequence of Bags-of-Data

    AU - Koshijima, Kensuke

    AU - Hino, Hideitsu

    AU - Murata, Noboru

    PY - 2015/10/1

    Y1 - 2015/10/1

    N2 - In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

    AB - In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

    KW - anomaly detection

    KW - Change-point detection

    KW - Earth Movers Distance

    KW - entropy estimator

    UR - http://www.scopus.com/inward/record.url?scp=84941585413&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84941585413&partnerID=8YFLogxK

    U2 - 10.1109/TKDE.2015.2426693

    DO - 10.1109/TKDE.2015.2426693

    M3 - Article

    AN - SCOPUS:84941585413

    VL - 27

    SP - 2632

    EP - 2644

    JO - IEEE Transactions on Knowledge and Data Engineering

    JF - IEEE Transactions on Knowledge and Data Engineering

    SN - 1041-4347

    IS - 10

    M1 - 7095580

    ER -