Software Defects and Their Impact on System Availiability: A Study of Field Failures in Operating Systems

        Understanding software defects is fundamental to building systems that avoid or tolerate the failures they cause. Never before has software played as critical a role in determining overall system availability as it does today. This paper uses field data reported over several thousand machine years to develop an understanding of software defects, their nature and impact on the system. The results from this study will be of value to several areas: design, development, recovery, fault-injection and modeling. The goal is to characterize defects by attributes that provide insight into their cause at a functional level and the environment that triggered the defect to cause failure. We provide distributions for each of these attributes. These distributions provide a base line to develop fault models, identify and prioritize techniques, and also provide guidelines for fault-injection. Given the concern over overlay defects, we have in addition specifically focused on them. To the best of our knowledge these results provide the first detailed analysis of field software defects in operating system code.

By: M. Sullivan and R. Chillarege

Published in: RC16357 in 1990

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .