ExaPlan: Queueing-Based Data Placement and Provisioning for Large Tiered Storage Systems (Revised Version August 2015)

Multi-tiered storage, where each tier comprises one type of storage device, e.g., SSD, HDD, is a commonly used approach to achieve both high performance and cost efficiency in large-scale systems that need to store data with vastly different access characteristics. By aligning the access characteristics of the data to the characteristics of the storage devices, higher performance can be achieved for any given cost. This article presents ExaPlan, a method to determine both the data-to-tier assignment and the number of devices in each tier that minimize the system’s mean response time for a given budget and workload. In contrast to other methods that constrain or minimize the system load, ExaPlan directly minimizes the system’s mean response time estimated by a queueing model. Minimizing the mean response time is typically intractable as the resulting optimization problem is both non-convex and combinatorial in nature. ExaPlan circumvents this intractability by introducing a parameterized data-placement approach that makes it a highly scalable method that can be easily applied to exascale systems. Through experiments that use parameters from real-world storage systems, such as CERN and LOFAR, it is demonstrated that ExaPlan provides solutions that yield lower mean response times than previous works. It is also capable of determining a data-to-tier assignment both at the level of files and at the level of fixed-size extents. For some of the workloads evaluated, file-level placement exhibited a significant performance improvement over extent-level placement.

A shortened version of this paper can be found in the Proceedings of the 22nd IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), (IEEE, October 2015) 218-227.

By: Ilias Iliadis, Jens Jelitto, Yusik Kim, Slavisa Sarafijanovic, Vinodh Venkatesan

Published in: RZ3887 in 2015

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rz3887_revised.pdf

Questions about this service can be mailed to reports@us.ibm.com .