Long-Term Archiving of Digital Information

The Preservation of Digital Data for the long term presents a variety of challenges. First and foremost, there is the technical challenge. How to avoid loosing data because of changes in the storage medium, decives, and data formats? How to be able to run a viewer in the future for an archived multimedia file, or a computer aided design system, or even popular game? Then, there are social and behavioral aspects. Which information needs to be archive, who decides? What about intellectual property, proof of authenticity, etc.? Clearly, all aspects will require much work before full solutions may be proposed and implemented. This paper focuses solely on the technical aspects of the problem, and more precisely, on how to interpret a bit stream that has been successfully archived and later retrieved.

A distinction is made between archiving a data file and archiving a program (so that its behavior may be reenacted in the future), and a research direction is proposed for both problems, based on the same basic mechanism.

For the archiving of a data file, the proposal consists of specifying the processing that needs to be performed on the data (as physically stored) in order to return the information to a future client (according to a logical view of the data). The Process specification and the logical view definition need to be archived with the data.

For the archiving of a program behavior, the proposal consists of saving the original executable object code together with the specification of the processing that needs to be performed for each machine instruction of the original computer.

Both processing specifications are based on the use of a Universal Virtual Computer that is general yet basic enough as to remain relevant in the future.

By: Raymond A. Lorie

Published in: RJ10185 in 2000


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .