COLA: Optimizing Stream Processing Applications Via Graph Partitioning

In this paper, we describe an optimization scheme for fusing compile-time operators into reasonably-sized run-time software units called processing elements (PEs). Such PEs are the basic deployable units in System S, a highly scalable distributed stream processing middleware system. Finding a high quality fusion significantly benefits the performance of streaming jobs. In order to maximize throughput, our solution approach attempts to minimize the processing cost associated with inter-PE stream traffic while simultaneously balancing load across the processing hosts. Our algorithm computes a hierarchical partitioning of the operator graph based on a minimum-ratio cut subroutine. We also incorporate several fusion constraints in order to support real-world System S jobs. We experimentally compare our algorithm with several other reasonable alternative schemes, highlighting the e ectiveness of our approach.

By: Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, Bugra Gedik

Published in: Middleware 2009, Urbana-Champaign, Springer Verlag, p.308-27 in 2009

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .