Today’s skilled IT professionals bring to bear an enormous amount of knowledge about how systems are configured, how they function on a day-to-day basis, and how to repair them when they break. However, there are not enough skilled IT professionals to meet the ever-growing demand. Autonomic computing offers a way out of this dilemma: offload the responsibility of managing complex systems onto the systems themselves, rather than relying on limited human resources.
This problem raises a large challenge: how will we transfer the knowledge about systems management and configuration from the human experts to the software managing the systems? We believe this problem is fundamentally a knowledge acquisition problem. Our approach to solving this problem draws on machine learning and knowledge representation. Our core idea is based on programming by demonstration: by observing several human experts each solve a similar problem on different systems, we generalize from traces of their activity to create a robust procedure that is capable of automatically performing the same task in the future. Our solution is based on the observation that solutions to similar problems share similar sub-procedures. By capturing these nuggets of problem-solving knowledge from multiple experts, we form a robust procedure that encapsulates the important parts of the procedures executed by all of the experts.
We are currently employing this approach to acquire deskside technical support procedures, such as upgrading a network card, troubleshooting email problems, and installing a new printer. Our system captures traces of multiple desk-side support representatives as they perform one task, such as diagnosing a dysfunctional network adapter, under a variety of operational conditions. From these traces, our system generalizes and aligns the traces into a single general procedure for repairing network adapters. An important feature of our approach is that it works across applications, via instrumentation of the Windows operating system.
This paper describes our formulation of this problem as a machine learning problem. First we define the problem and describe how various problem characteristics affect the difficulty of the learning problem. We then outline the subproblems we have identified, and describe our approach to each. Finally, we conclude with a summary of current results and directions for future work.
By: Tessa Lau, Daniel Oblinger, Lawrence Bergman, Vittorio Castelli, Corin Anderson
Published in: RC23115 in 2004
LIMITED DISTRIBUTION NOTICE:
This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.
Questions about this service can be mailed to reports@us.ibm.com .