Capitalization Recovery for Text

Proper capitalization in text is a useful, often mandatory characteristic. Many text processing techniques rely on proper capitalization, and people can more easily read mixed case text. Proper capitalization, however, is often absent in a number of text sources, including automatic speech recognition output and closed caption text. The value of these text sources can be greatly enhanced with proper capitalization. We describe and evaluate a series of techniques that can recover proper capitalization. Our formal system is able to recover more than 88% of the capitalized words with better than 90% accuracy.

By: Eric W. Brown, Anni R. Coden

Published in: Lecture Notes in Computer Science, volume 2273, (no ), pages 11-22 in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .