Depth First Generation of Large Itemsets for Association Rules

In this paper we present an algorithm for mining long patterns in databases. The algorithm finds large itemsets by using depth first search on a lexicographic tree of itemsets. The focus of this paper is to develop CPU-efficient algorithms for finding frequent itemsets in the cases when the database contains patterns which are very wide. We refer to this algorithm as DepthProject, and it achieves up to two orders of magnitude speedup over the recently proposed MaxMiner algorithm for finding long patterns. These techniques may be quite useful for applications in computational biology in which the number of records is relatively small, but the itemsets are very long. This necessitates the discovery of patterns using algorithms which are especially tailored to the nature of such domains.

By: Ramesh C Agarwal, Charu C. Aggarwal, V.V.V. Prasad

Published in: RC21538 in 1999

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .