Shared Memory Programming for Large Scale Machines

In this paper we evaluate the use of a shared memory programming language, Unified Parallel C (UPC) on BlueGene/L, a distributed memory machine. We demonstrate not only that shared memory programming for hundreds of thousands of processors is possible, but also that with the right support from the compiler and run-time system, the performance of the resulting codes is comparable to MPI implementations.

We describe the compiler infrastructure, the design of the UPC run-time system and communication software. We also discuss several compiler transformations that were used to optimize the UPC implementation of three well-known benchmarks (HPC RandomAccess, HPC STREAMS and NAS CG). We present scaling and absolute performance numbers for these benchmarks on up to 131072 processors, the full BlueGene/L machine.

By: Christopher Barton; Calin Cascaval; George Almási; Yili Zheng; Montse Farreras; José Nelson Amaral

Published in: RC23853 in 2006


