On Certain Pathwise Properties of the Sliding Window Lempel-Ziv Algorithm

Copyright © (2006) by IEEE. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distrubuted for profit. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

We derive a number of almost sure results related to the sliding window Lempel-Ziv (SWLZ) algorithm. A principal result is a pathwise lower bound to the redundancy which is off by a factor of two from the main term in the lower bound of Wyner [1] and Yang and Kieffer [2], which hold inthe expected sense for the Fixed Database Lempel Ziv algorithm, for different definitions of the compression ratio. Other results include a relatively weak upper bound on the redundancy and reasonably tight upper and lower bounds for the number of bits spent on the encoding of the phrase lengths. The main theme in deriving these results is a simple phrase length thresholding technique that traces its roots back to the work of Wyner and Ziv [3]; a significant aspect the present work can be viewed as exploring the extent to which said thresholding idea can give more detailed information about the properties of SWLZ. Another aspect of the present work studies the asymptotic behavior of the ratio of the number of phrases to the length of the parsed string for any finite sliding window size; in here we exploit results of Kieffer and Rahe [4]. In all cases it is assumed that the source is stationary and that in the most restrictive case it is an irreducible and aperiodic Markov chain; some of the results hold for sources that have exponential rates for entropy and more generally for the ergodic setting.

By: Luis Alfonso Lastras-Montaño

Published in: IEEE Transactions on Information Theory, volume 52, (no 12), pages 5267-83 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23813.pdf

Questions about this service can be mailed to reports@us.ibm.com .