Analyzing Analytics Part (1): A Survey of Business Analytics Models and Algorithms

Many organizations today are faced with the challenge of processing and distilling information from huge and growing collections of data. We see examples of retailers trying to better serve their customers, telecommunications providers trying to more effectively manage their growing networks, cities trying to provide smarter infrastructure and services for their rapidly growing populations, and many more similar instances across multiple industries. These organizations are facing a deluge of information and are increasingly deploying sophisticated mathematical algorithms to model the behavior of their business processes to discover correlations in the data, to predict trends and ultimately drive decisions to optimize their operations. These techniques, are known collectively as ”analytics”, and draw upon multiple disciplines, including statistics, quantitative analysis, data mining, and machine learning.

In this survey paper and the accompanying research report, we identify some of the key techniques employed in analytics both to serve as an introduction for the non-specialist and to explore the opportunity for greater optimization for parallel computer architectures and systems software. We are interested in isolating and documenting repeated patterns in analytical algorithms, data structures and data types, and in understanding how these could be most effectively mapped onto parallel systems. Scalable and efficient parallelism is critically important to enable organizations to apply these techniques to ever larger data sets for reducing the time taken to perform these analyses. To this end, we focus on analytical models (e.g. neural networks, logistic regression or support vector machines) that can be executed using different algorithms. For most major model types, we study implementations of key algorithms to determine common computational and runtime patterns. We then use this information to characterize and recommend suitable parallelization strategies.

By: Rajesh Bordawekar; Bob Blainey; Chidanand Apte; Michael McRoberts

Published in: RC25186 in 2011

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25186.pdf

Questions about this service can be mailed to reports@us.ibm.com .