Authoritative Sources In A Hyperlinked Environment

The link structure of a hypermedia environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. Versions of this principle have been studied in the hypertext research community and (in a context predating hypermedia) through journal citation analysis in the field of bibliometrics. But for the problem of searching in hyperlinked environments such the World Wide Web, it is clear from current techniques that the information inherent in the links has yet to be significantly exploited. In this work we develop a new method for automatically extracting certain types of informatian about a hypermedia environment from its link structure, and we report on experiments that demonstrate its effectiveness for a variety of search problems on the www.

The central problem we consider is that of determining the relative authority of pages in such environments. This issue is central to a number of basic hypertext search tasks; for exampIe, if the result of a query-based search consists of a large set of relevant pages, one may wish to select a small subset of the most "definitive" or "authoritative" pages to present to a user. At the same time, it is clearly difficult to formulate a definition of authority precise enough to be used in such contexts. We propose and test an algorithmic formulation of the notion of authority in a hyperlinked environment, based on its link structure, together with a method for extracting information about the authority of pages from this structure.

By: Jon M. Kleinberg

Published in: RJ10076 in 1997

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10076.pdf

Questions about this service can be mailed to reports@us.ibm.com .