Dynamic profiling and feedback framework for reduce-side join

Makoto Nakayama, Kenichi Yamazaki, Satoshi Tanaka, Hironori Kasahara

    研究成果: Conference contribution

    抄録

    MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

    元の言語English
    ホスト出版物のタイトルProceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
    ページ1255-1262
    ページ数8
    DOI
    出版物ステータスPublished - 2013
    イベント2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 - Sydney, NSW
    継続期間: 2013 12 32013 12 5

    Other

    Other2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
    Sydney, NSW
    期間13/12/313/12/5

    Fingerprint

    Feedback
    Servers

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)

    これを引用

    Nakayama, M., Yamazaki, K., Tanaka, S., & Kasahara, H. (2013). Dynamic profiling and feedback framework for reduce-side join. : Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 (pp. 1255-1262). [6755369] https://doi.org/10.1109/CSE.2013.187

    Dynamic profiling and feedback framework for reduce-side join. / Nakayama, Makoto; Yamazaki, Kenichi; Tanaka, Satoshi; Kasahara, Hironori.

    Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013. 2013. p. 1255-1262 6755369.

    研究成果: Conference contribution

    Nakayama, M, Yamazaki, K, Tanaka, S & Kasahara, H 2013, Dynamic profiling and feedback framework for reduce-side join. : Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013., 6755369, pp. 1255-1262, 2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013, Sydney, NSW, 13/12/3. https://doi.org/10.1109/CSE.2013.187
    Nakayama M, Yamazaki K, Tanaka S, Kasahara H. Dynamic profiling and feedback framework for reduce-side join. : Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013. 2013. p. 1255-1262. 6755369 https://doi.org/10.1109/CSE.2013.187
    Nakayama, Makoto ; Yamazaki, Kenichi ; Tanaka, Satoshi ; Kasahara, Hironori. / Dynamic profiling and feedback framework for reduce-side join. Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013. 2013. pp. 1255-1262
    @inproceedings{d3bf9449748e4b86b4371e13c0534ac8,
    title = "Dynamic profiling and feedback framework for reduce-side join",
    abstract = "MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.",
    keywords = "Data skew, Feedback, Framework, Profiling, Reduce-side Join",
    author = "Makoto Nakayama and Kenichi Yamazaki and Satoshi Tanaka and Hironori Kasahara",
    year = "2013",
    doi = "10.1109/CSE.2013.187",
    language = "English",
    pages = "1255--1262",
    booktitle = "Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013",

    }

    TY - GEN

    T1 - Dynamic profiling and feedback framework for reduce-side join

    AU - Nakayama, Makoto

    AU - Yamazaki, Kenichi

    AU - Tanaka, Satoshi

    AU - Kasahara, Hironori

    PY - 2013

    Y1 - 2013

    N2 - MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

    AB - MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

    KW - Data skew

    KW - Feedback

    KW - Framework

    KW - Profiling

    KW - Reduce-side Join

    UR - http://www.scopus.com/inward/record.url?scp=84900380009&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84900380009&partnerID=8YFLogxK

    U2 - 10.1109/CSE.2013.187

    DO - 10.1109/CSE.2013.187

    M3 - Conference contribution

    AN - SCOPUS:84900380009

    SP - 1255

    EP - 1262

    BT - Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013

    ER -