A High-Performance Domain Specific Parallel and Distributed Massive Collection System

High performance and ease of use are the two main goals of the Massive Collection System (MCS). On the outset, MCS is a classical process that consumes massive amount of input, processes it according to business specifications, and produces a comparable amount of output. To do that, MCS has a massive parallel architecture whose core processing task executes the business rules on a continuous flux of input records organized in files. Each processing task executes a processing “plan” which is a high level domain specific language (DSL) designed for domain experts rather than professional programmers.

The MCS design for performance is composed of two factors: one is the massively parallel execution framework; the second is the effective compilation and execution of the domain specific MCS plans. The execution framework is built on top of IBM J2EE implementation Websphere Application Server (WAS). The entire MCS is a WAS application, written in Java, which obtained its performance goals as well as ease of use.

The performance challenges of MCS were stated in terms of hundreds of millions of records a day. We selected Java and WAS for implementation due to their development advantages, allowing us to obtain proofs for the MCS performance goals rather early – within several months, which were shown to scale up almost linearly on the input size.

By: Uri Shani; Aviad Sela; Inna Skarbovsky

Published in: H-0250 in 2006

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

h-0250.pdf

Questions about this service can be mailed to reports@us.ibm.com .