Estimation of Singular Values of Very Large Matrices using Random Sampling

The singular value decomposition (SVD) has enjoyed a long
and rich history. Although it was introduced in the 1870's
by Beltrami and Jordan for its own intrinsic interest, the
it has become an invaluable tool in applied mathematics
and mathematical modeling. Singular value analysis has been
applied in a wide variety of disciplines, most notably for
least squares fitting of data. More recently, it is being
used in data mining applications and by search engines to
rank documents in very large databases, including the Web.
Recently, the dimensions of matrices which are used in many
mathematical models are becoming so large that classical
algorithms for computing the SVD cannot be used.

We present a new method to determine the largest 10%-25%
of the singular values of matrices which are so enormous
that use of standard algorithms and computational packages
will strain computational resources available to the average
user. In our method, rows from the matrix are randomly
selected, and a smaller matrix is constructed from the
selected rows. Next, we compute the singular values of the
smaller matrix. This process of random sampling and
computing singular values is repeated as many times as
necessary (usually a few hundred times) to generate a set
of training data for neural net analysis. Our method is a
type of randomized algorithm, i.e., algorithms which solve
problems using randomly selected samples of data which are too
large to be processed by coventional means.These algorithms
output correct (or nearly correct) answers most of the
time so long as the input has certain desirable properties.
We list these properties and show that matrices which
appear in information retrieval are fairly well suited for
processing using randomized algorithms. We note, however,
that the probability of arriving at an incorrect answer,
however small, is not naught since an unrepresentative
sample may be drawn from the data.

By: Mei Kobayashi, Georges Dupret, Oliver King and Hikaru Samukawa

Published in: Computers and Mathematics with Applications, volume 42, (no 10-11), pages 1331-1352 in 2001

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .