A Secure Data Deduplication Scheme for Cloud Storage

Nowadays, more and more corporate and private users outsource their data to cloud storage providers. At the same time, recent data breach incidents make end-to-end encryption an increasingly prominent requirement. Unfortunately, semantically secure encryption schemes render various cost-effective storage optimization techniques, such as data deduplication, completely ineffective. In this paper, we present a novel encryption scheme that guarantees semantic security for unpopular data and provides weaker security and better storage and bandwidth benefits for popular data. This way, data deduplication can be effective for popular data, whilst semantically secure encryption protects unpopular content, preventing its deduplication. Transitions from one mode to the other take place seamlessly at the storage server side if and only if a file becomes popular. We show that our scheme is secure under the Symmetric External Decisional Diffie-Hellman Assumption in the random oracle model, and evaluate its performance with benchmarks and simulations.

By: Jan Stanek, Alessandro Sorniotti, Elli Androulaki, Lukas Kencl

Published in: RZ3852 in 2013


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .