Parallel Mining of Association Rules: Design, Implementation, and Experience

        We consider the problem of mining association rules on a shared-nothing multiprocessor. We present three parallel algorithms that represent a spectrum of trade-offs between the computation, communication, memory usage, synchronization, and the use of problem-specific information. We describe the implementation of these algorithms on IBM POWERparallel SP2, a shared-nothing machine. Peformance measurements from this implementation show that the best algorithm, Count Distribution, scales linearly and has excellent speedup and sizeup behavior. The results from this study, besides being of interest in themselves, provide guidance for the design of parallel algorithms for other data mining tasks.

By: Rakesh Agrawal and John C. Shafer

Published in: RJ10004 in 1996

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .