Mining Mutually Dependent Patterns for System Management

Copyright © (2002) by IEEE. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distrubuted for profit. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

In some domains, such as isolating problems in computer networks and discovering stock market irregularities, there is more interest in patterns consisting of infrequent, but highly correlated items rather than patterns that occur frequently (as defined by minsup, the minimum support level). Herein, we describe the m-pattern, a new pattern that is defined in terms of minp, the minimum probability of mutual dependence of items in the pattern. We show that all infrequent m-pattern can be discovered by an efficient algorithm that makes use of: (a) a linear algorithm to qualify an m-pattern; (b) an effective technique for candidate pruning based on a necessary condition for the presence of an m-pattern; and (c) a level-wise search for m-pattern discovery (which is possible because m-patterns are downward closed). Further, we consider frequent m-patterns, which are defined in terms of both minp and minsup. Using synthetic data, we study the scalability of our algorithm. Then, we apply our algorithm to data from a production computer network both to show the m-patterns present and to contrast with frequent patterns. We show that when minp = 0 , our algorithm is equivalent to finding frequent patterns. However, with a larger minp, our algorithm yields a modest number of highly correlated items, which makes it possible to mine for infrequent but highly correlated item-sets.To date, many actionable m-patterns have been discovered in production systems.

By: Sheng Ma, Joseph L Hellerstein

Published in: IEEE Journal on Selected Areas in Communications, volume 20, (no 4), pages 726-35 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22237.pdf

Questions about this service can be mailed to reports@us.ibm.com .