Bootstrapping K-means for big data analysis

Jungkyu Han, Min Luo

研究成果: Conference contribution

6 被引用数 (Scopus)

抄録

In recent years, 'Big data' has become a popular word in industrial field. Distributed data processing middleware such as Hadoop makes companies to be able to extract useful information from their big data. However, information retrieval from newly available big data is difficult even with the aid of distributed data processing because the task needs many cycles of hypothesis establishment and test due to lack of prior knowledge about the data. K-means algorithm is one of popular algorithms which can be used in earlier stages of data mining because of the algorithm's speed and unsupervised characteristics. However, with big data, even k-means algorithm is not fast enough to get a desired result in an expected time period. In the paper, we propose a fast k-means method based on statistical bootstrapping technique. Our proposed method achieves roughly 100 times speedup and similar accuracy compared to Lloyd algorithm which is the most popular k-means algorithm in industrial field.

本文言語English
ホスト出版物のタイトルProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
出版社Institute of Electrical and Electronics Engineers Inc.
ページ591-596
ページ数6
ISBN(電子版)9781479956654
DOI
出版ステータスPublished - 2015 1 7
外部発表はい
イベント2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington
継続期間: 2014 10 272014 10 30

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CityWashington
Period14/10/2714/10/30

ASJC Scopus subject areas

  • 人工知能
  • 情報システム

フィンガープリント

「Bootstrapping K-means for big data analysis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル