Temperature-Aware Operating System Scheduling Policies for CMP Architectures

Thermal characteristics of modern microprocessors have presented numerous challenges in recent years. As the technology trends in power dissipation, feature scaling and clock frequencies continue, thermal behavior is expected to be a vital consideration in the next generation of microprocessor architecture design. There has been a wide range of dynamic thermal management (DTM) techniques proposed for reducing and managing on-chip temperatures. These schemes include: scaling the supply voltage, frequency and/or fetch rate, or even shutting down the clock signal to the processor. However, most of the proposed DTM techniques are based on the principle of “reactive throttling”: i.e. some form of performance throttling is invoked, after a prearchitected temperature threshold has been exceeded. As such, there is performance degradation with each such DTM technique; and, in some cases, if the temperature threshold is set to a relatively low level to conserve package/cooling cost, the degradation can be quite severe. In this paper, we investigate the potential benefits of thermal-aware operating system scheduling for chip multiprocessing architectures. The operating system is already designed to interrupt running jobs in accordance with time slice and scheduling parameters, as well as workload characteristics (e.g. i/o and memory behavior). The intuition motivating this work is that adding thermal-awareness to the scheduling heuristics will enable us to achieve chip-level temperature reduction, without adding any extra performance overhead (unlike hardware-based DTM control mechanisms). We explored various operating system policies and their effect on temperature behavior of the processor. We developed a temperature estimation scheme for preliminary analysis of policies and eventually verified the policies with the Turandot/PowerTimer simulator on traces extracted from SPEC2000 Benchmark Set. Our results indicate that with thermal-aware O.S. scheduling, hotspot temperatures can be reduced significantly. We observed a 69.2% reduction in the number of thermally critical cycles compared to the worst case thermal scheduling and 52% reduction compared to random scheduling. On average, our MinTemp Scheduling policy yields less than 3% cycles in thermal violation. There was no appreciable change in net performance, compared to the baseline, temperature-unaware OS schedule.

By: Eren Kursun; ChenYong-Cher; Alper Buyuktosunoglu; Pradip Bose

Published in: RC23841 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23841.pdf

Questions about this service can be mailed to reports@us.ibm.com .