Efficient Estimation of Singular Values for Searching Very Large and Dynamic Web Databases

The singular value decomposition (SVD) has enjoyed a long
and rich history. Recently, it is being used in data mining
applications and by search engines to rank documents in
very large databases, including the Web. The dimensions of
matrices which appear in these applications are becoming
so large that classical algorithms for computing the SVD
cannot always be used. We present a new method to determine
the largest 10\%--25\% of the singular values of matrices
which are so enormous that use of standard algorithms and
computational packages will strain computational resources
available to most users. In our method, rows from the matrix
are randomly selected, and a smaller matrix is constructed
from the selected rows. Next, we compute the singular
values of the smaller matrix. This process of random
sampling and computing singular values is repeated as many
times as necessary (usually a few hundred times) to
generate a set of training data for neural net analysis.
We demonstrate the power and accuracy of our method
through examples.

By: Mei Kobayashi, Georges Dupret, Oliver King and Hikaru Samukawa

Published in: RT0365 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0365.pdf

Questions about this service can be mailed to reports@us.ibm.com .