Using Multi-Attribute Predicates for Mining Classification Rules

        In order to improve the efficiency of deriving classification rules from a large training dataset, we develop in this paper a two-phase method for multi-attribute extraction. A feature that is useful in inferring the group identity of a data tuple is said to have a good inference powr to that group identity. Given a large training set of data tuples, the first phase, referred to as feature extraction phase, is applied to a subset of the training database with the purpose of identifying useful features which have good inference powers to group identities. In the second phase, referred to as feature combination phase, these extracted features are evaluated together and multi-attribute predicates with strong inference powers are identified. A technique on using match index of attributes is devised to reduce the processing cost, and some theoretical results on the inference powers of attributes are derived. The proposed method is evaluated empirically and sensistivity analysis on various parameters is conducted. From our results, it is shown that the two-phase method proposed is in general very efficient and leads to solutions of very high quality.

By: Ming-Syan Chen and Philip S. Yu

Published in: RC20562 in 1996

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .