We extend Hoeffding's inequalities for simple averages of random variables to the case of cross-product averages. We also survey some new and existing Hoeffding inequalities for estimators of the mean, variance, and standard deviation of a subpopulation. These results are applicable to two problems in object-relational database management systems: fixed-precision estimation of the selectivity of a join and online processing of aggregation queries. For the first problem, the new results can be used to modify the asymptotically efficient sampling-based procedures of Haas, Naughton, Seshadri, and Swami so that there is a guaranteed upper bound on the number of sampling steps. For the second problem, the inequalities can be used to develop conservative confidence intervals for online aggregation; such intervals avoid the large intermediate storage requirements and undercoverage problems of intervals based on large-sample theory. (This paper is a revision of RJ 10040)
By: Peter J. Haas
Published in: RJ10157 in 1999
This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.
Questions about this service can be mailed to reports@us.ibm.com .