Analysis of a New Intra-Disk Redundancy Scheme for High-Reliability RAID Storage Systems in the Presence of Unrecoverable Errors

Today's data storage systems are increasingly adopting low-cost disk drives that have higher capacity but lower reliability, leading to more frequent rebuilds and to a higher risk of unrecoverable media errors. We propose a new XOR-based intra-disk redundancy scheme, called interleaved parity check (IPC), to enhance the reliability of RAID systems that incurs only negligible I/O performance degradation. The proposed scheme introduces an additional level of redundancy inside each disk, on top of the RAID redundancy across multiple disks. The RAID parity provides protection against disk failures, while the proposed scheme aims to protect against media-related unrecoverable errors. A comparison between the proposed scheme and traditional redundancy schemes based on Reed-Solomon (RS) codes and single-parity-check (SPC) codes is conducted by analytical means. A new model is developed to capture the effect of correlated unrecoverable sector errors. The probability of an unrecoverable failure associated with these schemes is derived for the new correlated model as well as for the simpler independent error model. Furthermore, we derive closed-form expressions for the mean time to data loss of RAID 5 and RAID 6 systems in the presence of unrecoverable errors and disk failures. We then combine these results for a comprehensive characterization of the reliability of RAID systems that incorporate the proposed intra-disk redundancy scheme. Our results show that in the practical case of correlated errors, the proposed scheme provides the same reliability as the optimum albeit more complex RS coding scheme. Finally, the throughput performance of incorporating the intra-disk redundancy on various RAID systems is evaluated by means of event-driven simulations.

By: Ajay Dholakia, Evangelos Eleftheriou, Xiao-Yu Hu, Ilias Iliadis, Jai Menon and KK Rao

Published in: RZ3652 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rz3652.pdf

Questions about this service can be mailed to reports@us.ibm.com .