RAMP: A Model for Reliability Aware MicroProcessor Design

This report introduces RAMP, an architectural model for long-term processor reliability measurement. With aggressive transistor scaling and increasing processor power and temperature, reliability due to wear-out mechanisms is expected to become a significant issue in microprocessor design. Reliability awareness at the microarchitectural design stage will soon be a necessity and RAMP provides a convenient abstraction to do so.

RAMP models chip wide mean time to failure as a function of the failure rates of individual structures on chip due to different failure mechanisms, and can be used to evaluate the reliability implications of different applications, architectural features, and processor designs.

RAMP is a self-standing module which can be attached to architectural simulators which generate power and temperature measurements, and has currently been ported to IBM’s Turandot processor simulator and the RSIM architectural simulator.

By: Jayanth Srinivasan, Sarita V. Adve, Pardip Bose, Jude Rivers, Chao-Kun Hu

Published in: RC23048 in 2003

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23048.pdf

Questions about this service can be mailed to reports@us.ibm.com .