A family registration data entry system with functions for
automatic form layout analysis and character recognition
was developed. The layout analysis module first detects characters
and ruled lines by using information on the top and bottom boundaries
of smeared
black components. It then determines the layout and identifies each
field in the layout by comparing predefined models with
detected lines. Character strings in the fields are recognized
and matched with a dictionary to check whether a sequence is
plausible as a Japanese word or not. The text data are registered
in a database after they have been examined by an operator
and keywords have been extracted.
This system was actually used for the initial entry of typed
family registration forms in Tokyo's Toshima Ward, which contributed
to establish the first computerized family registration system
in Japan.
By: Tomio AMANO, Kazuharu TOYOKAWA, Takashi MANO, and Shuhji TORIYAMA
Published in: RT0142 in 1996
This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.
Questions about this service can be mailed to reports@us.ibm.com .