On Range Query Indexing for Efficient Stream Processing

To monitor a large number of continual range queries against a rapid data stream, each incoming data item should only be evaluated against relevant queries, not all the queries. Generally speaking, a main memory-based query index with a small storage cost and a fast search time is needed. In this paper, we study a 2D range query index that meets both criteria. It centers around a set of predefined, containment-encoded squares, or CES’s. CES’s are multi-layered, virtual constructs used to decompose range queries and maintain the query index. With containment-encoding, the search process is extremely efficient; most of the operations can be carried out by a simple logical-shift instruction. Simulations show that, with a small index storage cost, the CES-based query index substantially outperforms other alternatives in search time.

By: Kun-Lung Wu; Shyh-Kwei Chen; Philip S. Yu

Published in: RC23873 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23873.pdf

Questions about this service can be mailed to reports@us.ibm.com .