Data stream clustering for low-cost machines

Christophe Cérin*, Keiji Kimura, Mamadou Sow

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Nowadays, the operations performed by the Internet of Things (IoT) systems are no more trivial since they rely on more sophisticated devices than in the past. The IoT system is physically composed of connected computing, digital, mechanical devices such as sensors or actuators. Most of the time, each of them incorporates a logical arithmetic unit that can pre-compute or compute on the device. To extract value from the data produced at the edge, processing power offered by cloud computing is still utilized. However, streaming data to the cloud exposes some limitations related to the increased communication and data transfer, which introduces delays and consumes network bandwidth. Clustering data is one example of a treatment that can be executed in the cloud. In this paper, we propose a methodology for solving the data stream clustering problem at the edge. Data Stream clustering is defined as the clustering of data that arrive continuously, such as telephone records, multimedia data, sensors data, financial transactions, etc. Since we use low-cost and low-capacity devices, the objective is, given a sequence of points, to construct a good clustering of the stream using a small amount of memory and time. We propose a ‘windowing’ scheme, coupled with a sampling scheme to respect the objective. Under the experimental conditions, experiments show that the clustering solutions can be controlled, with difficulties for time-stamped data but not for random data or data with well-delimited clusters. The main advantage of our schema is that we are clustering data “on the fly” with no knowledge or assumption regarding the available data. We do not assume that all the data are known before a treatment batch by batch. Our schema also has the potential to be adapted to other classes of machine learning algorithms.

Original languageEnglish
Pages (from-to)57-70
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume166
DOIs
Publication statusPublished - 2022 Aug

Keywords

  • Edge AI
  • Experiments on heterogeneous and low cost hardware
  • Machine-learning algorithms
  • Online data stream clustering

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Data stream clustering for low-cost machines'. Together they form a unique fingerprint.

Cite this