Optimal Aggregation Algorithms for Middleware

Assume that each object in a database has m grades, or scores, one for each of m attributes.
For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells
how round it is. For each attribute, there is a sorted list, which lists each object and its grade under
that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, or
combining rule, such as min or average, that combines the individual grades to obtain an overall grade. To determine objects that have the best overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm ("Fagin’s Algorithm", or FA) that is much more efficient. For some distributions on grades, and for some monotone aggregation functions, FA is optimal in a high-probability sense. We give an elegant and remarkably simple algorithm ("the threshold algorithm", or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer.

By: Ronald Fagin, Amnon Lotem, Moni Naor

Published in: RJ10205 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10205.pdf

Questions about this service can be mailed to reports@us.ibm.com .