Performance Modeling of Operators in a Streaming System

Modeling the resource consumption of each runtime processing element (PE) is essential to the optimal resource allocation of System S–a distributed streaming processing platform. SPADE is the programming language of System S for developing streaming applications using an operator-based approach. Because a SPADE operator tends to be small in CPU consumption, multiple operators are usually fused at compile time into PEs for efficient runtime deployment. As a result, modeling the resource function (RF) at the SPADE operator level becomes increasingly important for the system to optimally (1) fuse operators into PEs at compile time and (2) allocate PEs to physical nodes at runtime. There are two main challenges in modeling operator-level resource functions. First, how do we recover the baseline operator-level resource functions (OP RFs) from the raw data collected with limited precision and under a changing runtime environment? Second, how do we estimate the resource function for a PE with any given fusion and node mapping from the baseline OP RFs?

n this paper, we propose a new operator-level RF learning infrastructure for System S. (i) The infrastructure specifies the necessary procedures to recover OP RF(s) from PEs running in fused/unfused mode and (ii) use the resulting OP RF(s) to predict the PE RF(s) with different fusion scenarios. We studied the resource profiling for major SPADE built-in operators and presented several techniques to overcome measurement errors from SPADE OP data collection. The impact of hardware speed and multi-threading contention are also studied. We show that our method can be applied to several SPADE applications and the prediction of the PE RFs is on the average within 15% of the actual CPU usage fractions from runtime PE measurement.

By: Xiaolan J. Zhang; Sujay S. Parekh; Bugra Gedik; Henrique Andrade; Kun-Lung Wu

Published in: RC24945 in 2009

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24945.pdf

Questions about this service can be mailed to reports@us.ibm.com .