Scaling IR-System Evaluation Using Term Relevance Sets

This paper describes an evaluation method based on Term Relevance Sets (Trels) that measures an IR system's quality by examining the content of the retrieved results rather than by looking for pre-specified relevant pages. Trels consist of a list of terms believed to be relevant for a particular query as well as a list of irrelevant terms. The proposed method does not involve any document relevance judgments, and as such is not adversely affected by changes to the underlying collection. Therefore, it can better scale to very large, dynamic collections such as the Web. Moreover, this method can evaluate a system's effectiveness on an updatable "live" collection, or on collections derived from different data sources. Our experiments show that the proposed method is very highly correlated with official TREC measures.

By: Einat Amitay, David Carmel, Ronny Lempel, Aya Soffer

Published in: Proceedings of Sheffield SIGIR 2004: The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, , ACM., p.10-17 in 2004

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

h0228.pdf

Questions about this service can be mailed to reports@us.ibm.com .