Visualizing Jobs with Shared Resources in Distributed Environments

In this paper we describe a visualization system that shows the behavior of jobs in large, distributed computing clusters. The system has been in use for two years, and is sufficiently generic to be applied in two quite different domains: a Hadoop MapReduce environment and the Watson DeepQA DUCC cluster. Scalable and flexible data processing systems typically run hundreds or more of simultaneous jobs. The creation, termination, expansion and contraction of these jobs can be very dynamic and transient, and it is difficult to understand this behavior without showing its evolution over time. While traditional monitoring tools typically show either snapshots of the current load balancing or aggregate trends over time, our new visualization shows the behavior of each of the jobs over time, in the context of the cluster, and in either a real-time or post-mortem view. Its new algorithm runs in real-time mode and can make retroactive adjustments to produce smooth layouts. Moreover, our system allows users to drill down to see details about single jobs. The visualization has been proven useful for administrators to see the overall occupancy of the cluster, and for users to spot errors or to see how many resources are given to their jobs.

By: Wim De Pauw, Joel Wolf, Andrey Balmin

Published in: Proceedings of 2013 First IEEE Working Conference on Software Visualization (VISSOFT)Piscataway, NY,, IEEE, p.10.1109/VISSOFT.2013.6650535 in 2013

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25382.pdf

Questions about this service can be mailed to reports@us.ibm.com .