Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements

Classification of rare events has many important data mining applications.
Boosting is a promising meta-technique that improves the classification

performance of any weak classifier. So far, no systematic study has been
conducted to evaluate how boosting performs for the task of mining rare
classes. In this paper, we evaluate three existing categories of boosting
algorithms for their ability to achieve high recall and high precision for a
given rare class in the context of binary classification. We explain all these
algorithms from the single viewpoint of how their weight updating
mechanisms work at each iteration, and discuss their possible effect on
emphasizing recall or precision. We propose enhanced algorithms in two
of the categories, and justify their choice of weight updating parameters
theoretically. Using some specially designed synthetic datasets, we
compare the capability of all the algorithms from the rare class perspective.
The results support our qualitative analysis of the algorithms, and also

indicate that our enhancements yield extra capability to their predecessor
algorithms for achieving better balance between recall and precision.

By: Mahesh V. Joshi, Vipin Kumar, Ramesh C. Agarwal

Published in: RC22147 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc22147.pdf

Questions about this service can be mailed to reports@us.ibm.com .