The Contribution of Finite-State Technology to Named Entity Recognition and Typing

This brief note revisits the question of relative merits of manual pattern crafting and machine learning techniques for named entity recognition and typing. In particular, it describes (in outline) an experiment which exemplifies, and empirically validates, the strengths of a combined approach where a robust classification algorithm makes informed use of finite-state grammars defining a number of semantic categories. Assuming the ability to submit a document for analysis by independent devices, one or more of which will be grammar-based, and given a suitable machinery for principled combination of the resulting analysis streams, the experiment demonstrates that high precision pattern-driven semantic category identification (even if the grammars target a subset of the larger set of categories of interest) can significantly boost the overall performance of the combination device.

By: Branimir K. Boguraev

Published in: RC22971 in 2003

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc22971.pdf

Questions about this service can be mailed to reports@us.ibm.com .