Benchmark datasets for fault detection and classification in sensor data

Bas De Bruijn, Tuan Anh Nguyen, Doina Bucur, Kenji Tei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: A dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.

Original languageEnglish
Title of host publicationSENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks
PublisherSciTePress
Pages185-195
Number of pages11
ISBN (Electronic)9789897581694
Publication statusPublished - 2016 Jan 1
Externally publishedYes
Event5th International Confererence on Sensor Networks, SENSORNETS 2016 - Rome, Italy
Duration: 2016 Feb 192016 Feb 21

Other

Other5th International Confererence on Sensor Networks, SENSORNETS 2016
CountryItaly
CityRome
Period16/2/1916/2/21

Fingerprint

Fault detection
Sensors
Smart sensors
Temperature
Polynomials
Hardware
Communication

Keywords

  • Benchmark dataset
  • Data quality
  • Fault tolerance
  • Sensor data
  • Sensor data labelling

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Cite this

De Bruijn, B., Nguyen, T. A., Bucur, D., & Tei, K. (2016). Benchmark datasets for fault detection and classification in sensor data. In SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks (pp. 185-195). SciTePress.

Benchmark datasets for fault detection and classification in sensor data. / De Bruijn, Bas; Nguyen, Tuan Anh; Bucur, Doina; Tei, Kenji.

SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks. SciTePress, 2016. p. 185-195.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

De Bruijn, B, Nguyen, TA, Bucur, D & Tei, K 2016, Benchmark datasets for fault detection and classification in sensor data. in SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks. SciTePress, pp. 185-195, 5th International Confererence on Sensor Networks, SENSORNETS 2016, Rome, Italy, 16/2/19.
De Bruijn B, Nguyen TA, Bucur D, Tei K. Benchmark datasets for fault detection and classification in sensor data. In SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks. SciTePress. 2016. p. 185-195
De Bruijn, Bas ; Nguyen, Tuan Anh ; Bucur, Doina ; Tei, Kenji. / Benchmark datasets for fault detection and classification in sensor data. SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks. SciTePress, 2016. pp. 185-195
@inproceedings{53fe32d06d724e41a42be471bbcf351b,
title = "Benchmark datasets for fault detection and classification in sensor data",
abstract = "Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: A dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.",
keywords = "Benchmark dataset, Data quality, Fault tolerance, Sensor data, Sensor data labelling",
author = "{De Bruijn}, Bas and Nguyen, {Tuan Anh} and Doina Bucur and Kenji Tei",
year = "2016",
month = "1",
day = "1",
language = "English",
pages = "185--195",
booktitle = "SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks",
publisher = "SciTePress",

}

TY - GEN

T1 - Benchmark datasets for fault detection and classification in sensor data

AU - De Bruijn, Bas

AU - Nguyen, Tuan Anh

AU - Bucur, Doina

AU - Tei, Kenji

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: A dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.

AB - Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: A dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.

KW - Benchmark dataset

KW - Data quality

KW - Fault tolerance

KW - Sensor data

KW - Sensor data labelling

UR - http://www.scopus.com/inward/record.url?scp=84971393455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971393455&partnerID=8YFLogxK

M3 - Conference contribution

SP - 185

EP - 195

BT - SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks

PB - SciTePress

ER -