A Hadoop performance model for multi-rack clusters

Jungkyu Han, Masakuni Ishii, Hiroyuki Makino

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop's scalability, there are a few works have been made on the scalability in multi-rack clusters. In multi-rack clusters of real world, network topology becomes a major scalability bottleneck due to the limited network switch capacity. It is a waste of resources to add servers to a Hadoop cluster in such situation. Therefore, it is helpful for users to save cost by efficiently measuring the network influence to Hadoop before they add a new server to their clusters. In this paper, we describe a Hadoop performance model for the multi-rack clusters. We modeled network influence on Hadoop and achieved about 95% accuracy to the real measurement. Furthermore, we predicted Hadoop scalability in large clusters with our model and show Hadoop scales enough even in multi-rack clusters.

Original languageEnglish
Title of host publication2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Proceedings
Pages265-274
Number of pages10
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Amman, Jordan
Duration: 2013 Mar 272013 Mar 28

Other

Other2013 5th International Conference on Computer Science and Information Technology, CSIT 2013
CountryJordan
CityAmman
Period13/3/2713/3/28

Keywords

  • Distributed Data Processing
  • Hadoop
  • Map-Reduce
  • Performance Modeling

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems

Fingerprint Dive into the research topics of 'A Hadoop performance model for multi-rack clusters'. Together they form a unique fingerprint.

  • Cite this

    Han, J., Ishii, M., & Makino, H. (2013). A Hadoop performance model for multi-rack clusters. In 2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Proceedings (pp. 265-274). [6588791] https://doi.org/10.1109/CSIT.2013.6588791