Join Algorithms For Online Aggregation

        We provide a new family of join algorithms, called ripple joins, for online processing of complex, multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications. As opposed to traditional offline join algorithms, which are designed to minimize the time to completion of the query, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. This time is independent of the size of the input relations. Ripple joins are adaptive, adjusting their behavior in accordance with the statistical properties of the data, and permit the user to trade off the time between the successive updates of the running aggregate and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we provide formulas and algorithms both for computing confidence intervals and for adaptively setting the ripple join "aspect-ratio" parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.

By: Peter Haas, Joseph Hellerstein

Published in: RJ10126 in 1998


rj10126.ps

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10126.ps

Questions about this service can be mailed to reports@us.ibm.com .