The Impact of Technology Scaling on Processor Lifetime Reliability

The relentless scaling of CMOS technology has provided a steady increase in processor performance for the past two decades. However, increased power densities (hence temperatures) and other scaling effects have an adverse impact on long-term processor lifetime reliability. This paper represents a first attempt at quantifying the impact of scaling on lifetime reliability due to intrinsic hard errors, taking workload characteristics into consideration. For our quantitative evaluation, we use RAMP [20], a previously proposed industrial-strength model that provides reliability estimates for a workload, but for a given technology. We extend RAMP by adding scaling specific parameters to enable workload-dependent lifetime reliability evaluation at different technologies. We show that (1) scaling has a significant impact on processor hard failure rates – on average, we find the failure rate of a 65nm processor to be 316% higher than a similarly pipelined, scaled 180nm processor; (2) of all the failure mechanisms, time-dependent dielectric breakdown and stress migration are the most significant, due to increasing temperatures, less than ideal voltage scaling, and reduced interconnect dimensions; and (3) with scaling, the difference in reliability from running at worst-case vs. typical workload operating conditions increases significantly, as does the difference from running different workloads. Our results imply that leveraging a single microarchitecture design for multiple remaps across a few technology generations will become infeasible; microarchitects must incorporate lifetime reliability awareness at the early design stage; and this awareness must incorporate workload-specific vs. worst-case considerations.

By: Jayanth Srinivasan, Sarita V. Adve, Pradip Bose, Jude Rivers

Published in: RC23047 in 2003


