Extracting Names From Natural Language Text

We describe Nominator, a module we developed to extract
proper names from natural
language text, which
is currently being integrated
into IBM products and services.
Using fast and robust heuristics,
Nominator locates
names in text, determines what type of entity they refer to -- such
as person, place or organization -- and groups together all the
variant names that refer to the same entity.
For example, President Clinton, Mr. Clinton and Bill
Clinton are grouped as referring to the same person.
Each group is assigned a canonical name,
(e.g., Bill Clinton)
to distinguish it from other groups referring to other
entities (Clinton, New Jersey).
Nominator produces
a dictionary, or
database, of names associated with a collection of documents.

By: Yael Ravin, Nina Wacholder

Published in: RC20338 in 1997


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .