Chinese Named Entity Recognition Based on Multilevel Linguistic Features

This paper presents a Chinese named entity recognition system that employs the Robust Risk Minimization (RRM) classification method and incorporates the advantages of character-based and word-based models. From experiments on a large-scale corpus, we show that significant performance enhancements can be obtained by integrating various linguistic information (such as Chinese word segmentation, semantic types, part of speech, and named entity triggers) into a basic Chinese character based model. A novel feature weighting mechanism is also employed to obtain more useful cues from most important linguistic features. Moreover, to overcome the limitation of computational resources in building a highquality named entity recognition system from a large-scale corpus, informative samples are selected by an active learning approach.

By: Honglei Guo, Jian Min Jiang, Gang Hu, Tong Zhang

Published in: Lecture Notes in Artificial Intelligence, volume 3248, (no ), pages 90-99 in 2004

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .