ACE: Classification for Information Lifecycle Management

One of the principal problems in Information Lifecycle Management is to align the business value of data with the most cost-effective and appropriate storage infrastructure. In this paper, we introduce ACE: a framework of tools for ILM, that classifies data and storage resources, and generates a data placement plan for informed utilization of the available storage resources in the system. The goal of ACE is to design a data placement plan that provides cost benefits to an organization while allowing efficient access to all important data. To achieve this goal, ACE uses a policy-based approach to classify data and storage based on the metadata attributes and capabilities respectively. The main advantage of using ACE is that it enables appropriate usage of under-utilized storage systems without extensive human intervention. Another key characteristic of ACE is that it uses a policy-based architecture to automate the process of data valuation and storage classification.

We implement the ACE framework and evaluate its benefits for three real data sets. One data sets consists of 1.28 million anonymous medical industry record files of total size 1461GB, and we show that ACE provides a cost benefit of greater than 70% over the lifetime of the data. In addition to the novel valuation algorithms and overall architecture, we also demonstrate optimizations that reduce the total performance time to 85% of the time taken without these optimizations, while still maintaining classification accuracy of over 85%.

By: Gauri Shah; Kaladhar Voruganti; Piyush Shivam; Maria Alvarez

Published in: RJ10372 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10372.pdf

Questions about this service can be mailed to reports@us.ibm.com .