Rules of Thumb for Selecting Metrics for Detecting Performance Problems

This paper addresses the selection of metrics so as to facilitate the early detection of performance problems. Our approach combines results from queueing theory and statistical hypothesis testing to develop rules-of-thumb for when one metric is preferred to another. Examples of these rules include: \begin{itemize} \item Measures of queue length are more sensitive to performance problems than are measures of utilization. \item Queue length measures provide more sensitive detection than measures of response times if the performance problem is dominated by an increase in expected arrival rates. \item Response times are preferred to queue lengths if the performance problem is dominated by an increase in expected service times. \end{itemize} These rules are assessed for performance problems in the CPU and paging sub-systems of a production computer system. In all cases, the data are consistent with the rules.

By: Joseph L. Hellerstein

Published in: RC20485 in 1996


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

Questions about this service can be mailed to .