Dynamic Collaboration in Autonomic Computing

In this paper, we propose a set of requirements, identify a set of standards designed to help support those requirements, and show that systems that partially respect these requirements can indeed exhibit several important aspects of self-management. Section 1.2 sets forth our proposed requirements, which focus on supporting dynamic collaboration among autonomic elements. Next, in section 1.3, we discuss standardsand the extent to which they do and do not support those requirements. The next four sections discuss the present status and possible future of two systems that exhibit dynamic collaboration to achieve system-level self-management. The first system, described in section 1.4, is a datacenter prototype called Unity that heals, configures, and optimizes itself in the face of failures, disruptions, and highly varying workload. Unity achieves these self-managing capabilities through a combination of algorithms that support the behavioral requirements, design patterns that introduce new resources such as registries and sentinels that assist other resources, and well orchestrated interactions. We discuss the benefits that Unity derives from its use of a subset of the requirements and standards. Then, in section 1.5, we speculate on the extra degree of self-management that could be attained were Unity to more fully the proposed requirements and standards. The second system, described in section 1.6, focuses more specifically on a particular interaction between two commercially available system components: a workload manager that allocates resources on fine-grained scale (e.g. CPU and memory share) and a resource arbiter that allocates more coarse-grained resources such as entire physical servers. We discuss how the requirements and standards support dynamic collaboration between the workload manager and the resource arbiter, and in section 1.7 we speculate about how their fuller implementation could yield further benefits. We close in section 1.8 with a summary of our recommendations on requirements and standards, and speculations about the future.

By: David M. Chess; Jeffrey O. Kephart; James E. Hanson; Ian N. Whalley; Steve R. White

Published in: RC23767 in 2005

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23767.pdf

Questions about this service can be mailed to reports@us.ibm.com .