Abstract
Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop's scalability, there are a few works have been made on the scalability in multi-rack clusters. In multi-rack clusters of real world, network topology becomes a major scalability bottleneck due to the limited network switch capacity. It is a waste of resources to add servers to a Hadoop cluster in such situation. Therefore, it is helpful for users to save cost by efficiently measuring the network influence to Hadoop before they add a new server to their clusters. In this paper, we describe a Hadoop performance model for the multi-rack clusters. We modeled network influence on Hadoop and achieved about 95% accuracy to the real measurement. Furthermore, we predicted Hadoop scalability in large clusters with our model and show Hadoop scales enough even in multi-rack clusters.
Original language | English |
---|---|
Title of host publication | 2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Proceedings |
Pages | 265-274 |
Number of pages | 10 |
DOIs | |
Publication status | Published - 2013 |
Externally published | Yes |
Event | 2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 - Amman, Jordan Duration: 2013 Mar 27 → 2013 Mar 28 |
Other
Other | 2013 5th International Conference on Computer Science and Information Technology, CSIT 2013 |
---|---|
Country/Territory | Jordan |
City | Amman |
Period | 13/3/27 → 13/3/28 |
Keywords
- Distributed Data Processing
- Hadoop
- Map-Reduce
- Performance Modeling
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Information Systems