Scheduling for Clustered Vector Processors Near Memory

The Active Memory Cube is a processing-in-memory device that achieves high power efficiency with a carefully designed microarchitecture eliminating much of the complexity of conventional cores. To deliver high performance it requires a sophisticated compiler, which we present in this work.

We propose a novel clustering algorithm for distributed functional units and a scheduling algorithm for temporal vector operations within the cluster. We propose an abstract machine model that precisely captures vector functional units, vector register file sharing at the element level, communication between clusters, and interaction between the core and the memory subsystem. Unlike prior approaches our clustering algorithm operates at the scope of an entire procedure to make globally optimal assignments.

We achieve high computational efficiency and linear performance scaling on memory- and compute-bound kernels using standard, portable pragmas and no accelerator-specific program code. Our work is an important step toward building next-generation, power-efficient computing systems.

By: Arpith C. Jacob, Zehra Sura, Tong Chen, Carlo Bertolli, Samuel Antao, Olivier Sallenave, Kevin O’Brien, Ravi Nair, Jose R. Brunheroto, Philip Jacob, Bryan S. Rosenburg, Yoonho Park, Alexandre E. Eichenberger, Changhoan Kim

Published in: RC25645 in 2016

rc25645.pdf

Questions about this service can be mailed to reports@us.ibm.com .