A Scalable Implementation of NAS Parallel Benchmark BT on Distributed Memory Systems

In this paper, we describe an efficient and scalable implementation of the NAS Parallel Benchmark BT suitable for distributed memory systems such as the IBM Scalable POWERparallel systems. After describing the parallelization and data partitioning methods used, we outline some of the optimization steps used to realize good performance on individual processors and to reduce the communication overheads on the IBM SP1 and SP2 systems. We present performance results on up to 128 nodes of SP1 and on SP2, with wide nodes. We describe the performance on the standard Class A and Class B problem sets. To show the scalability of our parallelization methods, we present performance of two additional data sets. (ScalParSys)

By: Vijay K. Naik

Published in: IBM Systems Journal, volume 34, (no 2), pages 273-91 in 1995

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .