Rules of Thumb for Selecting Metrics for Detecting Performance Problems

This paper addresses the selection of metrics so as to facilitate the early detection of performance problems. Our approach combines results from queueing theory and statistical hypothesis testing to develop rules-of-thumb for when one metric is preferred to another. Examples of these rules include: \begin{itemize} \item Measures of queue length are more sensitive to performance problems than are measures of utilization. \item Queue length measures provide more sensitive detection than measures of response times if the performance problem is dominated by an increase in expected arrival rates. \item Response times are preferred to queue lengths if the performance problem is dominated by an increase in expected service times. \end{itemize} These rules are assessed for performance problems in the CPU and paging sub-systems of a production computer system. In all cases, the data are consistent with the rules.

By: Joseph L. Hellerstein

Published in: RC20485 in 1996


