Fault Diagnois in Scale-free versus Random Networks

Cost-efficient problem detection and localization is one of the key requirements to a selfmanaging system. In this paper, we consider detection and diagnosis (localization) of faults in large-scale computer networks and distributed systems. Particularly, we investigate the effects of network topology (e.g., scale-free versus random graphs) on the cost-efficiency of detection and diagnosis in terms of the number of tests required and the resulting accuracy of diagnosis. Recent studies suggest that the topology of the Internet, world wide web, peer-to-peer (e.g., Gnutella) and many other real-life networks is quite far from being a random graph (i.e., classical Erdos-Renyi model). Instead, such networks exhibit scale-free properties, following power-low degree distributions (e.g., a small subset of nodes - hubs - are connected to a very large number of while the majority of nodes have quite small degree). Our studies show that the network topology dramatically influences the cost-efficiency of diagnosis, such as the number of tests necessary for problem detection and diagnosis. In our studies, we use as tests end-to-end measurements, or probes (e.g., ping or traceroute command). Scale-free networks appear to be much harder to diagnose than random networks of the same size, as they require significantly larger number of probe stations and end-to-end probes. We investigate the effect of the probe stations’ location on the necessary number of probes. We also estimate the computational complexity of multi-fault diagnosis in scale-free and random networks by Bayesian inference. Our studies provide an important information for planning and deployment of probing-based diagnosis in extremely large-scale practical networks (e.g., large corporate intranets, Internet, and GRID-computing systems).

By: Natalia Odintsova; Irina Rish

Published in: RC23599 in 2005

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23599.pdf

Questions about this service can be mailed to reports@us.ibm.com .