Challenges Facing Software Fault-tolerance

As software dominates most discussions in the information technology business one needs to carefully examine where we are headed in software dependability. This paper re-examines some of the basic premises upon which the area of software fault-tolerance is built and critiques some current practices and beliefs. A few of the thoughts and contributions are: .The definition of a software failure needs to change from a specification based thought to one of customer expectation and ability to do productive work. This will cause a significant shift on what we build fault-tolerance for. However, it would also help narrow the gap between today's theory, practice and customer need. .Data on customer problems illustrates that 90% of the problems reported are what we have traditionally considered as non-defect - implying no need for a programming change. However, with the new definition of failure, we will need to address this more seriously as a part of fault-tolerance. This change could level the playing field and help achieve greater customer satisfaction. . A rationale for determining the amount of fault-tolerance based on the concept of the threshold of pain, is suggested. It helps guide the prioritization of fault-tolerance amongst competing forces, by platform and market segment. In conclusion the paper reflects on a few of the development world realities to temper what can be achieved and what we as a community need to be aware of.

By: R. Chillarege

Published in: Conference Proceedings of the First Conference on Fault-Tolerant Systems, unknown in 1995

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .