SPL: An Extensible Language for Distributed Stream Processing

Copyright © (2017) by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distrubuted for profit or commericial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

Big data is revolutionizing how all sectors of our economy do business, including telco, transportation, medical, and finance. Big data comes in two flavors: data at rest and data in motion. Processing data in motion is stream processing. Stream processing for big data analytics often requires scale that can only be delivered by a distributed system, exploiting parallelism on many hosts and many cores. To address this need, IBM built InfoSphereR Streams, a distributed stream processing platform. Early customer experience with Streams uncovered that another core requirement is extensibility, since customers want to build high-performance domain-specific operators for use in their streaming applications. Based on these two core requirements of distribution and extensibility, we designed and implemented a stream processing language called SPL. This paper describes SPL with an emphasis on the language design, distributed runtime, and extensibility mechanism. SPL is now the gateway for the Streams platform, used by our internal (research) and external (industry) customers for stream processing in a broad range of application domains.

By: Martin Hirzel, Scott Schneider, Bugra Gedik

Published in: ACM Transactions on Programming Languages and Systems , volume 39, (no 1), pages 10.1145/3039207 in 2017

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25486.pdf

Questions about this service can be mailed to reports@us.ibm.com .