Distributed SBP Cholesky Factorization Algorithms with Near-Optimal Scheduling

Copyright © (2009) by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distrubuted for profit or commericial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

The minimal block storage Distributed Square Block Packed (DSBP) format for distributed memory computing on symmetric and triangular matrices is presented. Three algorithm variants (Basic, Static, and Dynamic) of the blocked right-looking Cholesky factorization are designed for the DSBP format, implemented, and evaluated. On our target machine, all variants outperform standard full storage implementations while saving half the storage. Communication overhead is shown to be virtually eliminated by our Static and Dynamic variants which take advantage of hardware parallelism. The Static and Dynamic variants give nearly the same performance while clearly outperforming the Basic variant which in turn is comparable with standard implementations such as the one found in ScaLAPACK. Models of execution assuming zero communication costs are developed. The Static schedule is near-optimal on our target machine for medium to large problems based on comparisons with these models.

By: Fred Gustavson; Lars Karlsson; Bo Kågström

Published in: ACM Transactions on Mathematical Software , volume 36, (no 2), pages Art. no. 11 in 2009

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24342.pdf

Questions about this service can be mailed to reports@us.ibm.com .