Mulivariate Density Estimation from Lower Dimensional Slices

We introduce a new method of statistical density estimation based on projections of higher dimensional data on lower dimensional subspaces. The problem of estimating densities is expected to be more well-posed in lower dimensional subspace due to the fact that in lower dimensions data points can be viewed as relatively less sparse (thus, alleviating the problems arising from curse of dimensionality). While the problem of reconstruction of the higher dimensional density function from low dimensional densities is reminiscent of tomographic reconstruction problem, the reconstruction turns out to be nonunique unless additional constraints are imposed. One such constraint that we consider in the present paper is the maximum entropy criterion. Alternatively maximum likelihood estimated could also be used. Different data models for the projected data can be assumed within this framework. Among other models, we consider the gaussian mixture model for this purpose, and show that an Expectation Maximization strategy for parameter update can be used to solve the problem. While updates of means and covariances can be obtained more or less in a manner similar to the standard EM algorithm ala e.g., Dempster-Schafer[8], computation of best directions of projection, when built into the EM algorithm, essentially involves solving an additional nonlinear optimization problem that is interesting in its own right, and has recently appeared in other statistical problems [5, 171. Interesting connections of special cakes of this optimization problem with known results on the theory of stochastic matrices are pointed out in the context of our discussion. Simple numerical examples are worked out with both real and synthetic data to validate the efficacy of the proposed method.

By: Sankar Basu

Published in: RC22244 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22244a.pdf

Questions about this service can be mailed to reports@us.ibm.com .