Use of Ramdomization to Normalize Feature Merits

Feature merits are used for feature selection in classification and regression as well as for decision tree generation. Most of the merit functions exhibit a bias towards features that take a large variety of values. We present a scheme based on randomization for neutralizing this bias by normalizing the merits. The merit of a feature is normalized by division by the expected merit of a feature that is random noise taking the same distribution of values as the given feature. The noise feature is obtained by randomly permuting the values of the given feature. The scheme can be used for any merit function including the Gini and entropy measures. We demonstrate its effectiveness by applying it to the contextual merit defined by Hong (IBM RC19964, 1994).

By: Se June Hong, J. R. M. Hosking and Shmuel Winograd

Published in: RC20072 in 1995

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

7111.ps.gz

Questions about this service can be mailed to reports@us.ibm.com .