LEO-DB2's Learning Optimizer

Most modem DBMS optimizers rely upon a cost model to choose the best query execution plan
(QEF) for any given query. Cost estimates are heavily dependent upon the optimizer’s estimates
for the number of rows that will result at each step of the QEP for complex queries involving many
predicates and/or operations. These estimates, in turn, rely upon statistics on the database and
modeling Assumptions that may or may not be true for a given database. In this paper we introduce
LEO, DB2’s Leaming Optimizer, as a comprehensive way to repair incorrect statistics and
cardinality estimates of aII query execution plan. By monitoring previously executed queries, LEO
compares the optimizer’s estimates with actuals at each step in a QEP, and computes adjustments
to cost estimates and statistics that may be used during future query optimizations. This analysis
can be done either on-line or off-line on a separate system, and either incrementally or in batches.
In this way, LEO introduces a feedback loop to query optimization that enhances the available
information on the database the most queries have occured, allowing the optimizer to
actually learn from its past mistakes. Our technique is general and can be applied to any operation
in a QEP (not just selection predicates on base tables), including joins, Derived results atIer several
predicates have been applied, and even to DISTINCT and GROUP-BY operators. As shown by
performance measurements on a 10 GB TPC-H data set, the runtime overhead of LEO’s
monitoring is insignificant whereas the potential benefit to response time from more accurate
cardinaly and cost estimates call be orders of magnitude

By: Michael Stillger Guy Lehman Volker Markl Mokhtar Kandil

Published in: RJ10209 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RJ10209.pdf

Questions about this service can be mailed to reports@us.ibm.com .