On Text Around Anchors and Its Use in Web IR

In this paper we report on the study of text around anchors. In particular, this paper presents the notion of a descriptive snippet. A descriptive snippet is a unit of text that appears alongside the anchor, within a visually distinct arrangement, and describes the target page the anchor links to. We show the various new ways snippets have been incorporated into retrieval systems, serving as page summaries, retrieval units, and intelligent data mining building blocks. This paper reviews the implementation and incorporation of SnipIt, a tool for extracting descriptive snippets, within the InCommonSense summarization system ([3][4]), the Google search engine([5]), IBM’s Juru search engine ([6]), and IBM’s WebFountain platform ([22]). We show that the extracted descriptive snippets allow an expanded, fresh and unique recount of the pages they describe, as well as provide an encapsulated coherent unit of text suitable for both retrieval and display of high quality search results.

By: Einat Amitay

Published in: H-0238 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

H-0238.PDF

Questions about this service can be mailed to reports@us.ibm.com .