Mining Association Rules with Adjustable Accuracy

        In this paper, we devise efficient algorithms for mining association rules with adjustable accuracy. It is noted that several applications require mining the transaction data to capture the customer behavior frequently. In those applications, the efficiency of data mining could be a more important factor than the requirement for complete accuracy of the mining results. Allowing imprecise results can significantly improve the data mining efficiency. In this paper, two methods for mining association rules with adjustable accuracy are developed. By dealing with the concept of sampling, both methods obtain some essential knowledge from a sampled subset first, and in light of that knowledge, perform efficient association rule mining on the entire database. A technique of relaxing the support factor based on the sampling size is devised to achieve the desired level of accuracy. These two methods differ from each other in their ways of utilizing the samled data. Performance of these two methods is comparatively analyzed. As shown by our experimental results, the relaxation factor, as well as the sample size, can be properly adjusted so as to improve the result accuracy while minimizing the corresponding execution time, thereby allowing us to effectively achieve a design trade-off between accuracy and efficiency with two control parameters. It is shown that with the advantage of controlled sampling, the proposed methods are very flexible and efficient, and can in general lead to results of a very high degree of accuracy.

By: Jong Soo Park (Sungshin Women's Univ., Korea), Philip S. Yu and Ming-Syan Chen (National Taiwan Univ.)

Published in: RC20695 in 1997

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .