Comparing top k lists

Also appeared in SIAM Journal on Discrete Mathematics, vol. 17, no. 1, 2003, p. 134-60

Motivated by several applications, we introduce various distance measures between "top k lists." Some of these distance measures are metrics, while others are not. For each of these latter distance
measures, we show that they are "almost" a metric in the following two seemingly unrelated aspects:

(i) they satisfy a relaxed version of the polygonal (hence, triangle)
inequality, and
(ii) there is a metric with positive constant multiples that bound our
measure above and below.

This is not a coincidence---we show that these two notions of almost being a metric are formally identical. Based on the second notion, we define two distance measures to be equivalent if they are bounded above and below by constant multiples of each other. We thereby identify a large and robust equivalence class of distance measures.

Besides the applications to the task of identifying good notions of (dis-)similarity between two top k lists, our results imply polynomial-time constant-factor approximation algorithms for the rank aggregation problem (Dwork et al., Proc. WWW10, 2001) with respect to a large class of distance measures.

By: Ronald Fagin, Shanmugasundaram Ravikumar, Dandapani SivakumarA

Published in: Configurable Computing: Technology and Applications. Proceedings of 2003 ACM-SIAM Symposium on Discrete Algorithms (SODA 03). , p.28-36 in 2003

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .