### Abstract

In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

Original language | English |
---|---|

Article number | 7095580 |

Pages (from-to) | 2632-2644 |

Number of pages | 13 |

Journal | IEEE Transactions on Knowledge and Data Engineering |

Volume | 27 |

Issue number | 10 |

DOIs | |

Publication status | Published - 2015 Oct 1 |

### Fingerprint

### Keywords

- anomaly detection
- Change-point detection
- Earth Movers Distance
- entropy estimator

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Information Systems
- Computer Science Applications

### Cite this

*IEEE Transactions on Knowledge and Data Engineering*,

*27*(10), 2632-2644. [7095580]. https://doi.org/10.1109/TKDE.2015.2426693

**Change-Point Detection in a Sequence of Bags-of-Data.** / Koshijima, Kensuke; Hino, Hideitsu; Murata, Noboru.

Research output: Contribution to journal › Article

*IEEE Transactions on Knowledge and Data Engineering*, vol. 27, no. 10, 7095580, pp. 2632-2644. https://doi.org/10.1109/TKDE.2015.2426693

}

TY - JOUR

T1 - Change-Point Detection in a Sequence of Bags-of-Data

AU - Koshijima, Kensuke

AU - Hino, Hideitsu

AU - Murata, Noboru

PY - 2015/10/1

Y1 - 2015/10/1

N2 - In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

AB - In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.

KW - anomaly detection

KW - Change-point detection

KW - Earth Movers Distance

KW - entropy estimator

UR - http://www.scopus.com/inward/record.url?scp=84941585413&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941585413&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2015.2426693

DO - 10.1109/TKDE.2015.2426693

M3 - Article

AN - SCOPUS:84941585413

VL - 27

SP - 2632

EP - 2644

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 10

M1 - 7095580

ER -