Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop's scalability, there are a few works have been made on the scalability in multi-rack clusters. In multi-rack clusters of real world, network topology becomes a major scalability bottleneck due to the limited network switch capacity. It is a waste of resources to add servers to a Hadoop cluster in such situation. Therefore, it is helpful for users to save cost by efficiently measuring the network influence to Hadoop before they add a new server to their clusters. In this paper, we describe a Hadoop performance model for the multi-rack clusters. We modeled network influence on Hadoop and achieved about 95% accuracy to the real measurement. Furthermore, we predicted Hadoop scalability in large clusters with our model and show Hadoop scales enough even in multi-rack clusters.
|ホスト出版物のタイトル||2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Proceedings|
|出版ステータス||Published - 2013|
|イベント||2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Amman, Jordan|
継続期間: 2013 3月 27 → 2013 3月 28
|Other||2013 5th International Conference on Computer Science and Information Technology, CSIT 2013|
|Period||13/3/27 → 13/3/28|
ASJC Scopus subject areas