SystemT: An Algebraic Approach to Declarative Information Extraction

As information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become increasingly important. In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. We compare SystemT’s approach against cascading grammars, both theoretically and with a thorough experimental evaluation. Our results show that SystemT can deliver result quality comparable to the state-of-the-art and an order of magnitude higher annotation throughput.

By: Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Sudarshan Rangarajan, Frederick R. Reiss, Shivakumar Vaithyanathan

Published in: RJ10523 in 2014

rj10523.pdf

Questions about this service can be mailed to reports@us.ibm.com .