On Computing the Data Cube

        On-Line Analytical Processing (OLAP) applications often require computation of multiple related group-bys. This paper presents fast algorithms for computing a collection of group-bys. We focus first on a special case of the aggregation problem-computation of the cube operator. The cube opeator requires computing group-bys on all possible combinations of a list of attributes. Our algirhtms extend hash-based and sort-based grouping methods with several optimizations, like combining common operations across multiple group-bys, caching, and using pre-computed group-bys for computing other group-bys. Empirical evaluation using real-life OLAP data shows that the resulting algorithms yield a factor of two to eight improvement over straighforward methods and have running times very close to estimated lower bounds. We then extend our algorithms to compute a subset of a cube in which the required set of aggregates does not include all combinations of attributes. Finally, we extend our algorithms to handle the common OLAP case in which attributes are grouped along hierarchies defined on them.

By: S. Sarawagi (Univ. of CA-Berkeley), R. Agrawal and A. Gupta (Oracle Corp.)

Published in: RJ10026 in 1996

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .