Strategies for Problem Determination Using Probing

As distributed systems continue to grow in size
and complexity, scalable and cost-efficient techniques are
needed for performing tasks such as problem determination
and fault diagnosis. In this paper, we address these tasks
using probes, or test transactions, which replace traditional
“passive” event-correlation techniques with a more active,
real-time information-gathering approach. We provide a theoretical
foundation and a set of practical techniques for implementing
efficient probing strategies - the main issue is
minimizing the cost of probing while maximizing the diagnostic
accuracy of the probe set. We show that finding an
optimal probe set is NP-hard and devise polynomial-time approximation
algorithms that demonstrate excellent empirical
performance, even on large networks. We also implement an
active, on-line probing strategy that yields a significant reduction
in the probe set size.

By: Mark Brodie, Irina Rish, Sheng Ma, Alina Beygelzimer, Natalia Odintsova

Published in: RC22928 in 2003

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22928.pdf

Questions about this service can be mailed to reports@us.ibm.com .