Should I Use a GPU? Predicting GPU Performance from CPU Runs

Over the past decade, graphics processing unit (GPU) platforms have evolved into general purpose programmable accelerators. General purpose GPU (GPGPU) programming languages such as OpenCL and CUDA bring GPU acceleration within reach for a large programming audience.

Despite GPGPU languages, effective GPU programming remains difficult, for all the reasons parallel programming remains difficult. In addition to general parallel programming challenges such as managing locality, communication and synchronization, GPUs present challenges with device-specific optimization and portability. To realize significant performance gains, the GPU programmer must carefully tune for low-level device-specific issues such as branch divergence, thread group resources, and a particular non-uniform memory hierarchy [27]. Not all vendors support the same programming model or API, and the same code can perform much differently on different devices [6].

When considering a code for GPU acceleration, how can the programmer predict whether the port is worth the effort? GPU speedup, compared to an optimized parallel multi-core implementation, varies considerably depending on the application and its inputs [22]. Many anecdotes report successful GPU acceleration stories, but many efforts fail to achieve the desired speedup. While general guidelines describe styles of algorithms that match the GPU architecture, no method can reliably predict speedup for a unique, individual application.

This paper addresses the problem of identifying which parallel loops could benefit from GPU acceleration without incurring the effort of porting to the new device. Given a program that runs on a CPU, we present a method to predict the speedup achievable on a GPU, based on machine learning from previous porting exercises. In addition, we demonstrate predictive models that can choose the best device (i.e., obtaining the best performance) for an application given a specified input.

By: Ioana Baldini, Stephen J. Fink, Erik Altman

Published in: RC25487 in 2014

rc25487.pdf

Questions about this service can be mailed to reports@us.ibm.com .