Some Theoretical and Practical Perspectives on Boosting Weak Predictors

In this paper, we present some theoretical and experimental results related to boosting weak decision trees. In the first section, we present a broad overview of boosting fundamentals and techniques used to build predictive classifiers and show different derivations of the AdaBoost algorithm. A theoretical explanation of the relationships between different derivations is presented. In section 2 we derive specific forms of boosted decision trees by introducing a more consistent minimization criterion. We focus in particular on "Stumps" which are one-level decision trees. The more complex Alternating Decision Trees (ADT) are also described. The important issue of overfitting and related regularization techniques are also addressed. We introduce a general formulation of the AdaBoost algorithm which extends it to the general case of cost-sensitive classification and regularization.
In section 3, we discuss implementation and computing performance issues and we present the results of the experiments conducted with the various boosting methods described in section 2 using a number of benchmark datasets as well as a project management database.

By: Michel Cuendet and Abderrahim Labbi

Published in: RZ3402 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rz3402.pdf

Questions about this service can be mailed to reports@us.ibm.com .