Constant-Time Sliding Window Aggregation

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. Most aggregation operations of interest can be cast as binary operators that are associative, but not necessarily commutative nor invertible. However, non-invertible operators are nontrivial to support efficiently. The best existing algorithms for this setting require O(log n) aggregation steps per window operation, where n is the window size at that point.

This paper presents DABA, a novel algorithm that significantly improves upon this time bound, assuming the sliding window has FIFO semantics. DABA requires only O(1) aggregation steps, in the worst case, per window operation. As such, DABA asymptotically improves the performance of sliding-window aggregation without restricting the operator to be invertible. Furthermore, our experimental results demonstrate that these theoretic improvements hold in practice. DABA is a significant improvement over the state-of-the-art for both throughput and latency.

By: Kanat Tangwongsan, Martin Hirzel, Scott Schneider

Published in: RC25574 in 2015

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25574.pdf

Questions about this service can be mailed to reports@us.ibm.com .