Nowadays, the operations performed by the Internet of Things (IoT) systems are no more trivial since they rely on more sophisticated devices than in the past. The IoT system is physically composed of connected computing, digital, mechanical devices such as sensors or actuators. Most of the time, each of them incorporates a logical arithmetic unit that can pre-compute or compute on the device. To extract value from the data produced at the edge, processing power offered by cloud computing is still utilized. However, streaming data to the cloud exposes some limitations related to the increased communication and data transfer, which introduces delays and consumes network bandwidth. Clustering data is one example of a treatment that can be executed in the cloud. In this paper, we propose a methodology for solving the data stream clustering problem at the edge. Data Stream clustering is defined as the clustering of data that arrive continuously, such as telephone records, multimedia data, sensors data, financial transactions, etc. Since we use low-cost and low-capacity devices, the objective is, given a sequence of points, to construct a good clustering of the stream using a small amount of memory and time. We propose a ‘windowing’ scheme, coupled with a sampling scheme to respect the objective. Under the experimental conditions, experiments show that the clustering solutions can be controlled, with difficulties for time-stamped data but not for random data or data with well-delimited clusters. The main advantage of our schema is that we are clustering data “on the fly” with no knowledge or assumption regarding the available data. We do not assume that all the data are known before a treatment batch by batch. Our schema also has the potential to be adapted to other classes of machine learning algorithms.
ASJC Scopus subject areas
- コンピュータ ネットワークおよび通信