In-Place Reconstruction of Delta Compressed Files

In-place reconstruction of delta compressed data allows information on devices with limited storage
capability to be updated efficiently over low-bandwidth channels. Delta compression encodes a version of data compactly as a small set of changes from a previous version. Transmitting updates to data as delta versions saves both time and bandwidth. In-place reconstruction rebuilds the new version of the data in the storage or memory space the current version occupies -- no additional scratch space is needed.
By combining these technologies, we support large-scale, highly-mobile applications on inexpensive
hardware. We develop an algorithm that modifies a delta compressed version to be in-place reconstructible; it trades a small amount of compression to achieve in-place reconstruction. This algorithm brings the benefits of delta compression to space-constrained machines and devices on low-bandwidth networks. Our treatment includes an implementation and experimental results that show our algorithm to be efficient in space and time and verifies that compression losses are small. Also, we give results on the computational complexity of performing this modification while minimizing lost compression. We take a data-driven approach to determine important performance features, classifying files distributed on the Internet based on their in-place properties, and exploring the scaling relationship between files and data structures used by in-place algorithms. We conclude that in-place algorithms are I/O bound and that the performance of algorithms is more sensitive to the size of inputs and outputs, rather than the computational time to modify the delta version.

By: Randal Burns, Larry Stockmeyer, Darell D. E. Long

Published in: RJ10243 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10243.pdf

Questions about this service can be mailed to reports@us.ibm.com .