The Impact of Noise on the Scaling of Collectives: A Theoritical Approach

The performance of parallel applications running on large clusters is known to degrade due to the interference of kernel and daemon activities on individual nodes, often referred to as noise. In this paper, we focus on an important class of parallel applications, which repeatedly perform computation followed by a collective operation such as a barrier. We model this theoretically and demonstrate, in a rigorous way, the effect of noise on the scalability of such applications. We study three natural and important classes of noise distributions: the exponential distribution, the heavy-tailed distribution (captured by the Pareto distribution) and the Bernoulli distribution. We show that the systems scale well in the presence of exponential noise, but the performance goes down drastically in the presence of heavy-tailed of Bernoulli noise. The main contribution of this paper is to initiate the study of the impact of noise on the scaling of parallel applications in a formal manner. We believe that this study will prove to be extremely useful in identifying and improving the bottlenecks in the scalability of systems in a more systematic way, for instance, by designing scheduling policies, which take into account the nature of the noise to improve the overall system performance

By: Saurabh Agarwal, Rahul Garg, Nisheeth K Vishnoi

Published in: RI05003 in 2005

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RI05003.pdf

Questions about this service can be mailed to reports@us.ibm.com .