Dynamic profiling and feedback framework for reduce-side join

Makoto Nakayama, Kenichi Yamazaki, Satoshi Tanaka, Hironori Kasahara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
Pages1255-1262
Number of pages8
DOIs
Publication statusPublished - 2013
Event2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 - Sydney, NSW
Duration: 2013 Dec 32013 Dec 5

Other

Other2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
CitySydney, NSW
Period13/12/313/12/5

Fingerprint

Feedback
Servers

Keywords

  • Data skew
  • Feedback
  • Framework
  • Profiling
  • Reduce-side Join

ASJC Scopus subject areas

  • Computer Science (miscellaneous)

Cite this

Nakayama, M., Yamazaki, K., Tanaka, S., & Kasahara, H. (2013). Dynamic profiling and feedback framework for reduce-side join. In Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 (pp. 1255-1262). [6755369] https://doi.org/10.1109/CSE.2013.187

Dynamic profiling and feedback framework for reduce-side join. / Nakayama, Makoto; Yamazaki, Kenichi; Tanaka, Satoshi; Kasahara, Hironori.

Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013. 2013. p. 1255-1262 6755369.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakayama, M, Yamazaki, K, Tanaka, S & Kasahara, H 2013, Dynamic profiling and feedback framework for reduce-side join. in Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013., 6755369, pp. 1255-1262, 2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013, Sydney, NSW, 13/12/3. https://doi.org/10.1109/CSE.2013.187
Nakayama M, Yamazaki K, Tanaka S, Kasahara H. Dynamic profiling and feedback framework for reduce-side join. In Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013. 2013. p. 1255-1262. 6755369 https://doi.org/10.1109/CSE.2013.187
Nakayama, Makoto ; Yamazaki, Kenichi ; Tanaka, Satoshi ; Kasahara, Hironori. / Dynamic profiling and feedback framework for reduce-side join. Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013. 2013. pp. 1255-1262
@inproceedings{d8d342e58c8944c28ea4f3dd0cc0462f,
title = "Dynamic profiling and feedback framework for reduce-side join",
abstract = "MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.",
keywords = "Data skew, Feedback, Framework, Profiling, Reduce-side Join",
author = "Makoto Nakayama and Kenichi Yamazaki and Satoshi Tanaka and Hironori Kasahara",
year = "2013",
doi = "10.1109/CSE.2013.187",
language = "English",
pages = "1255--1262",
booktitle = "Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013",

}

TY - GEN

T1 - Dynamic profiling and feedback framework for reduce-side join

AU - Nakayama, Makoto

AU - Yamazaki, Kenichi

AU - Tanaka, Satoshi

AU - Kasahara, Hironori

PY - 2013

Y1 - 2013

N2 - MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

AB - MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

KW - Data skew

KW - Feedback

KW - Framework

KW - Profiling

KW - Reduce-side Join

UR - http://www.scopus.com/inward/record.url?scp=84900380009&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900380009&partnerID=8YFLogxK

U2 - 10.1109/CSE.2013.187

DO - 10.1109/CSE.2013.187

M3 - Conference contribution

AN - SCOPUS:84900380009

SP - 1255

EP - 1262

BT - Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013

ER -