Performance Analysis of the IBM XL UPC on the PERCS Architecture

Unified Parallel C (UPC) has been proposed as a parallel programming language for improving user productivity. Recently IBM released a prototype UPC compiler for the PERCS (Power 775 [35]) architecture. In this paper we analyze the performance of the compiler and the platform using various UPC applications. The Power 775 is one of IBM’s latest generation of supercomputers. It has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes (also called octants).

In this paper we discuss how XL UPC takes advantage of the hardware features available on this machine to provide scalable performance when using up to 32k cores. We analyze several benchmarks discussing the performance, describe limitations of some of the features of the language and computation paterns and discuss software and runtime solutions designed to address these limitations.

By: Gabriel Tanase, Gheorghe Almási, Ettore Tiotto, Michail Alvanos, Anny Ly, Barnaby Dalton

Published in: RC25360 in 2013


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .