Information Retrieval and Ranking on the Web: Benchmarking Studies I

The exponential growth of information available on the
World Wide Web has been documented in numerous studies.
The studies also indicate that Internet users are
turning to search engines and search services in
increasing numbers to find the information they are
seeking, but they are not necessarily satisfied with
their performance. Specific problems which have been
cited in user surveys include the speed of transmission
and retrieval of information and the format for
presenting the results from searches. In this report,
we describe some of the components of a new Web-based
search and retrieval system prototype, which is part of
a larger information outlining and visualization system
for Web documents. Our system is based on a modified
version of latent semantic indexing, the output from
which is used to rank the relevance of Web pages for an
input query. We report the speeds of computation of the
singular values and ranking when Householder
bidiagonalization and Givens rotations are used to
compute the singular values of a matrix representation
of document-query space. In particular, our discussions
will emphasize mathematical algorithms, efficient
dynamic memory allocation procedures, and relevance
ranking; linguistic techniques will be discussed only as
necessary for the sake of completeness.

By: Georges DUPRET and Mei KOBAYASHI

Published in: RT0300 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0300.pdf

Questions about this service can be mailed to reports@us.ibm.com .