We consider the problem of scheduling MapReduce workloads. A workload consists of multiple independent flows, and each flow is itself a set of MapReduce jobs with precedence constraints. We model this as a parallel scheduling problem [5, 9], more specifically as precedence constrained malleable scheduling with linear speedup and processor maxima. Each flow is associated with an arbitrary cost function that describes the cost incurred for completing the flow at a particular time. The overall objective is to minimize either the total cost (minisum) or the maximum cost (minimax) of the flows. We use resource augmentation analysis to provide a (2, 3) bicriteria approximation algorithm for general minisum objectives, and a (1, 2) bicriteria approximation algorithm for minimax objectives. We note that (unless P=NP) no finite approximation ratio is possible without resource augmentation. As corollaries of these results, we obtain a 6-approximation algorithm for total weighted completion time (and thus average completion time and average stretch), and a 2-approximation algorithm for maximum weighted completion time (and thus makespan and maximum stretch).

We also demonstrate via extensive simulation experiments the overall performance of our algorithms relative to optimal and also relative to other, standard MapReduce scheduling schemes.

By:* Andrey Balmin, Kirsten Hildrum, Viswanath Nagarajan, Joel Wolf*

Published in: RC25364 in 2013

**LIMITED DISTRIBUTION NOTICE:**

**This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.**

Questions about this service can be mailed to reports@us.ibm.com .