Scaling Invariant Principal Component Subset Selection in Principal Component Regression

Multiple regression has found applications in chemistry, pharmacology, and biochemistry
as a tool for understanding molecular activity in quantitative structure activity
relationship (QSAR) studies, in such diverse areas as near-infrared spectroscopy,
mutational enzyme activity studies, and the analysis of gene expression data from chip
arrays. Error analysis of principal component regression is dominated by the selection
of an optimal subset of principal components, whose quality is measured by their
contribution to the prediction of the independent variables and by their well conditioned
behavior. Principal components are dependent on the scaling and units of measurement of
the independent variables, which implies that the space spanned by some subset of principal
components is not invariant to scaling transformations, yielding an arbitrary character.
This paper presents a solution to the scaling problem in which a scale transformation is
constructed which produces a set of equally well conditioned components, of which one
contains all the predictive information of the regression. This scale transformation is
independent of the initial scaling of the independent variables. This implies that the
problems of conditioning and subset selection is an artifact of the initial scaling of the
independent variables.

By: Daniel E. Platt, Laxmi Parida, Yuan Gao, Isidore Rigoutsos

Published in: RC22074 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc22074.pdf

Questions about this service can be mailed to reports@us.ibm.com .