FARM: A Framework for Exploring Mining Spaces with Multiple Attributes

Mining for frequent itemsets typically involves a preprocessing step in which data with multiple attributes are grouped into transactions and items are defined based on attribute values. We have observed that such fixed attribute mining can severely constrain the patterns that are discovered. Herein, we introduce mining spaces, a new framework for mining multi-attribute data that not only discovers patterns but also discovers transaction and item definitions (with the exploitation of taxonomies and functional dependencies if they are available). We prove that special downward closure properties hold for mining spaces, a result that allows us to construct efficient algorithms for mining patterns without the constraints of fixed attribute mining. We apply our algorithms to synthetic data and to real world data collected from a production computer network. The results show that by exploiting the special kinds of downward closure in mining spaces, execution times for mining can be reduced by a factor of three to four.

By: Chang-Shing Perng, Haixun Wang, Sheng Ma, Joseph L. Hellerstein

Published in: RC21990 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc21990.pdf

Questions about this service can be mailed to reports@us.ibm.com .