Extracting Names From Natural Language Text

We describe Nominator, a module we developed to extract
proper names from natural
language text, which
is currently being integrated
into IBM products and services.
Using fast and robust heuristics,
Nominator locates
names in text, determines what type of entity they refer to -- such
as person, place or organization -- and groups together all the
variant names that refer to the same entity.
For example, President Clinton, Mr. Clinton and Bill
Clinton are grouped as referring to the same person.
Each group is assigned a canonical name,
(e.g., Bill Clinton)
to distinguish it from other groups referring to other
entities (Clinton, New Jersey).
Nominator produces
a dictionary, or
database, of names associated with a collection of documents.

By: Yael Ravin, Nina Wacholder

Published in: RC20338 in 1997


