Gradient Boosting for Joint Regression Modeling of Mean and Dispersion

We consider the joint regression modeling of the mean and dispersion for a conditional response from the Normal, Gamma and Inverse Gaussian distributions, which are continuous distributions from the exponential dispersion family for which the likelihood function takes a specific simple form. The regression methodology is based on extending Gradient Boosting (Friedman [8]) to incorporate dispersion modeling. When compared to a similar extension of Generalized Linear Models for dispersion modeling, this proposed new approach offers certain advantages, which include for example, the easy incorporation of relevant nonlinear and low-order covariate interaction effects in the regression function; robust and computationally-efficient modeling procedures suitable for large, high-dimensional data sets; and the ability to use high-cardinality categorical covariates directly in the regression without any ad hoc preprocessing or grouping of the feature levels that is often necessary for computational tractability in other modeling procedures. We provide the motivation, background theory and algorithmic details of the proposed methodology along with illustrative computational examples.

By: Ramesh Natarajan

Published in: RC24863 in 2009

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24863.pdf

Questions about this service can be mailed to reports@us.ibm.com .