Extracting information from unstructured text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of textual data. Therefore we discuss the use of FPGAs to perform large-scale text analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a text analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of text processing, we show that most tasks can be performed in an efficient streaming fashion.
A copy of this report is available from pol@zurich.ibm.com
By: Raphael Polig, Kubilay Atasu, Heiner Giefers and Laura Chiticariu
Published in: RZ3864 in 2014
This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.
Questions about this service can be mailed to reports@us.ibm.com .