The Active Memory Cube is a processing-in-memory device that achieves high power efficiency with a carefully designed microarchitecture eliminating much of the complexity of conventional cores. To deliver high performance it requires a sophisticated compiler, which we present in this work.
We propose a novel clustering algorithm for distributed functional units and a scheduling algorithm for temporal vector operations within the cluster. We propose an abstract machine model that precisely captures vector functional units, vector register file sharing at the element level, communication between clusters, and interaction between the core and the memory subsystem. Unlike prior approaches our clustering algorithm operates at the scope of an entire procedure to make globally optimal assignments.
We achieve high computational efficiency and linear performance scaling on memory- and compute-bound kernels using standard, portable pragmas and no accelerator-specific program code. Our work is an important step toward building next-generation, power-efficient computing systems.
By: Arpith C. Jacob, Zehra Sura, Tong Chen, Carlo Bertolli, Samuel Antao, Olivier Sallenave, Kevin O’Brien, Ravi Nair, Jose R. Brunheroto, Philip Jacob, Bryan S. Rosenburg, Yoonho Park, Alexandre E. Eichenberger, Changhoan Kim
Published in: RC25645 in 2016
Questions about this service can be mailed to reports@us.ibm.com .