Heterogeneous clusters and grid infrastructures are becoming increasingly popular. In these computing infrastructures, machines have different resources (e.g., memory sizes, disk space, and installed software packages). These differences give rise to a problem of over-provisioning, that is, sub-optimal utilization of a cluster due to users requesting resource capacities greater than what their jobs actually need. Our analysis of a real workload file (LANL CM5) revealed differences of up to two orders of magnitude between requested memory capacity and actual memory usage. The problem of over-provisioning has received very little attention so far. We discuss different approaches for applying machine learning methods to estimate the actual resource capacities used by jobs. These approaches are independent of the scheduling policies and the dynamic resourcematching schemes used. Our simulations show that these methods can yield an improvement of over 50% in utilization (throughput) of heterogeneous clusters.
By: Elad Yom-Tov; Yariv Aridor
Published in: H-0244 in 2006
LIMITED DISTRIBUTION NOTICE:
This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.
Questions about this service can be mailed to reports@us.ibm.com .