Accelerating Business Analytics Applications

Business and text analytics applications have seen rapid growth, driven by the mining of data for various decision making processes. Regular expression processing is an important component of these applications, consuming as much as 50% of their total execution time. While prior work on accelerating regular expression processing has focused on network intrusion detection systems, business analytics applications impose different requirements on regular expression processing efficiency. We present an analytical model of accelerators for regular expression processing, which includes memory, bus-, I/O-, and network-attached accelerators with a focus on business analytics applications. Based on this model, we advocate the use of vector-style processing for regular expressions in business analytics applications, leveraging the SIMD hardware available in many modern processors. In addition, we show how SIMD hardware can be enhanced to improve regular expression processing even further. We demonstrate a realized speedup better than 1.8 for the entire range of data sizes of interest. In comparison, the alternative strategies deliver only marginal improvement for large data sizes, while performing much worse than the SIMD solution for small data
sizes.

By: Valentina Salapura; Priya Nagpurkar; Tejas Karkhanis; José Moreira

Published in: RC25242 in 2011

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25242.pdf

Questions about this service can be mailed to reports@us.ibm.com .