Automating Physical Database Design in a Parallel Database: The DB2 Partitioning Advisor

Physical database design is important for query performance in a shared-nothing parallel database system, in which the base data is partitioned among multiple independent nodes. Given a workload, we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal performance. Previous attempts use heuristic rules to make those decisions. These approaches fail to consider all of the interdependent aspects of query performance typically modeled by today’s sophisticated query optimizers. This paper presents a comprehensive solution to the problem that has been tightly integrated with the optimizer of a commercial shared-nothing parallel database system-IBM Universal Database Enterprise Extended Edition (EEE). 0 UT approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates in an intelligent way that scales with the number of tabIes and queries in the workload. We compare a steepest-ascent heuristic with a genetic algorithm, which quickly converge to an optimal solution on a TPC-H workload that is 35% better than the human-selected partitions. Our experimental results demonstrate the effectiveness of this technique and the scalability of our algorithm for large workloads.

By: Jun Rao, Chun Zhang,Guy Lohman, Nimrod Megiddo

Published in: RJ10208 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RJ10208.pdf

Questions about this service can be mailed to reports@us.ibm.com .