OFC: An Optimized Flash Controller for Enterprise Storage Systems

NAND Flash technology has become the prime candidate for future high-performance enterprise storage applications. However, the architecture and design of Flash controllers have been primarily geared to consumer-market requirements, so that they typically neither sustain high write IOPS and short latencies nor offer the endurance and reliability required for enterprise storage applications. We present a flexible Flash controller architecture called Optimized Flash Controller (OFC) that fulfills all enterprise storage requirements and also serves as a generic platform to further improve the performance of Flash management algorithms. We show the interplay between hardware and firmware and its performance impact on a Flash controller. So far, the relationship between performance behavior resulting from the dependency on prior workloads and the actual controller design could only be observed from a black-box perspective. Further, our results for sequential and pseudo-random read/write workloads show how the number of relocated pages affects the overall performance. They are consistent with existing write amplification models and can be used to assess other Flash management algorithms.

The OFC is built on an FPGA evaluation board and includes Flash-channel controllers, an embedded processor, and SDRAM. Owing to its flexible architecture, new Flash-management algorithms can easily be developed in software and tested on real Flash memory devices. The controller firmware is built as a Linux kernel module, avoiding expensive context switches. Moreover, we developed a Flash-channel simulator that allows kernel code to be tested in user space using User-Mode Linux.

The current OFC version supports four Flash channels and two pipelined dies per channel, reaching a total capacity of 32 GiB. Measurements on our prototype show that we can achieve a maximum sustained sequential throughput of 115 MB/s reading and 72 MB/s writing. Moreover, we achieve a maximum sustained 4 KiB random performance of 24 KIOPS reading and 8.5 KIOPS writing.

By: R.A. Pletka, M. Varsamou, M. Bjoerkqvist, Th. Antonakopoulos, P. Mueller, R. Haas, E. Eleftheriou

Published in: RZ3795 in 2011

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rz3795.pdf

Questions about this service can be mailed to reports@us.ibm.com .