Discovering Fully Dependent Patterns

Pattern discovery is widely used to analyze market data. To date, the focus has been on frequent patterns. However, in applications such as detecting anomalies in computer networks and identifying security intrusions, frequent patterns characterize normal behavior, which is not of interest in these domains. Rather, the interest is in patterns that proceed malfunctions or other undesirable situations. Such patterns are characterized by items that co-occur with high probability, especially long, infrequent patterns (since these provide better predictive capabilities). Unfortunately, declining infrequent patterns in terms of the probability of item co-occurrence yields neither upward nor downward closure, and hence efficent algorithms cannot be constructed. Herein, we circumvent this problem by proposing fully dependent patterns (d-patterns). d-patterns are declined so that all subsets of a d-pattern are also d-patterns, a condition ensures downward closure. We develop a statistical test to qualify d-patterns, and construct an Efficient algorithm for their discovery. We apply our algorithm to data from a network at a large insurance company and show that several patterns of interest are discovered.

By: Sheng Ma, Feng Liang, Joseph L Hellerstein

Published in: RC22336 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22336.pdf

Questions about this service can be mailed to reports@us.ibm.com .