Optimizing Abstaining Classifiers using ROC Analysis (Revised Version)

Classifiers that refrain from classification in certain cases can significantly reduce the misclassification cost. However, the parameters for such abstaining classifiers are often set in a rather ad-hoc manner. We propose a method to optimally build a specific type of abstaining binary classifiers using ROC analysis. These classifiers are built based on optimization criteria in the following three models: cost-based, bounded-abstention and bounded-improvement. We demonstrate the usage and applications of these models to effectively reduce misclassification cost in real classification systems. The method has been validated with a ROC building algorithm and cross-validation on 15 UCI KDD datasets.

Revised Version of RZ Report: June 2, 2005.
A condensed version of this report has appeared in: ACM Int'l Conf. Proceedings Series, vol. 119 Proc. 22nd Int’l Conf. on Machine Learning “ICML 2005,” Bonn, Germany, (ACM, New York, August 2005) 665-672

By: Tadeusz Pietraszek

Published in: RZ3571 in 2004

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rz3571_revised.pdf

Questions about this service can be mailed to reports@us.ibm.com .