Hoeffding Inequalities for Join-Selectivity Estimation and Online Aggregation

        We extend Hoeffding's inequalities for simple averages of random variables to the case of cross-product averages. We also survey some new and existing Hoeffding inequalities for estimators of the mean, variance, and standard deviation of a subpopulation. These results are applicable to two problems in object-relational database management systems: fixed-precision estimation of the selectivity of a join and online processing of aggregation queries. For the first problem, the new results can be used to modify the asymptotically efficient sampling-based procedures of Haas, Naughton, Seshadri, and Swami so that there is a guaranteed upper bound on the number of sampling steps. For the second problem, the inequalities can be used to develop conservative confidence intervals for online aggregation; such intervals avoid the large intermediate storage requirements and undercoverage problems of intervals based on large-sample theory. (This paper is a revision of RJ 10040)

By: Peter J. Haas

Published in: RJ10157 in 1999

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .