Information retrieval and ranking on the Web: benchmarking studies II

The exponential growth of information available on the
World Wide Web has been documented in numerous studies.
The studies also indicate that Internet users are
turning to search engines and search services in
increasing numbers to find the information they are
seeking, but they are not necessarily satisfied with
their performance. Specific problems which have been
cited in user surveys include the speed of transmission
and retrieval of information and the format for
presenting the results from searches. In this report,
we describe some of the components of a new Web-based
search and retrieval system prototype, which is part of
a larger information outlining and visualization system
for Web documents. Our system is based on a modified
version of latent semantic indexing, the output from
which is used to rank the relevance of Web pages for
an input query.

We compare the speed of retrieval and ranking when
two different methods (subspace iteration and Lanczos
followed by Sturm sequence methods) are used to
compute the singular values of a matrix representation
of document-query space. Hardware and software-based
methods to speed up computations are presented. In
particular, our discussions will focus on mathematical
algorithms, efficient dynamic data structures
allocation procedures, and relevance ranking;
linguistic techniques will be discussed only as
necessary for the sake of completeness.

By: Oliver King and Mei Kobayashi

Published in: RT0298 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0298.pdf

Questions about this service can be mailed to reports@us.ibm.com .