Performance Evaluation of Inter-Thread Communication Mechanisms on Multicore/Multithreaded Architectures

The three major solutions for increasing the nominal performance of a CPU are: multiplying the number of cores per socket, expanding the embedded cache memories and use multi-threading to reduce the impact of the deep memory hierarchy. System with tens or hundreds of hardware threads, all sharing a cache coherent UMA or NUMA memory space, are today the de-facto standard. While these solutions can easily provide benefits in a multi-program environment, they require recoding of applications to leverage the available parallelism. Application threads must synchronize and exchange data, and the overall performance is heavily influenced by the overhead added by these mechanisms, especially as developers try to exploit finer grain parallelism to be able to use all available resources.

This paper examines two fundamental synchronization mechanisms - locks and queues - in the context of multi and many cores systems with tens of hardware threads. Locks are typically used in non streaming environments to synchronize access to shared data structures, while queues are mainly used as a support for streaming computational models. The analysis examines how the algorithmic aspect of the implementation, the interaction with the operating system and the availability of supporting machine language mechanism contribute to the overall performance. Experiments are run on Intel X86TM and IBM PowerENTM, a novel highly multi-threaded user-space oriented solution, and focus on fine grain parallelism - where the work performed on each data item requires only a handful of microseconds. The results presented here constitute both a selection tool for software developer and a datapoint for CPU architects

By: Massimiliano Meneghin, Davide Pasetto, Hubertus Franke, Fabrizio Petrini, Jimi Xenidis

Published in: RC25283 in 2012

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25283.pdf

Questions about this service can be mailed to reports@us.ibm.com .