Large-Sample and Deterministic Confidence Intervals for Online Aggregation

        Online aggregation processing in relational database systems enables users to both observe the progress of their aggregation queries and control execution of these queries on the fly. Running confidence intervals are an important component of an online aggregation system and indicate to the user the proximity of each running aggregate to the corresponding final result. Large-sample confidence intervals contain the final query result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. We provide formulas that can be used to compute both large-sample and deterministic confidence intervals for a variety of aggregates encountered in practice. The formulas are applicable to single-table AVG, COUNT, SUM, VARIANCE, and STD DEV queries with a selection predicate, and to multi-table AVG, SUM, and COUNT queries with joint and selection predicates.

By: Peter J. Haas

Published in: RJ10050 in 1996

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .