Advances in Predictive Model Generation for Data Mining

        Expanding Application demand for data mining of massive data warehouses has fueled recent advances in automated predictive methods. We first examine a few successful application areas and technical challenges they present. We discuss some theoretical developments in PAC learning and statistical learning theory leading to the emergence of support vector machines. We then examine some technical advances made in enhancing the performance of the models both in accuracy (boosting, bagging, stacking) and scalability of modeling through distributed model generation. Relatively new techniques for selecting good feature variables, feature discretization, generating probablistic models, and the use of practical measures for performance will also be discussed.

By: Se June Hong, Sholom M. Weiss

Published in: RC21570 in 1999

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .