A Document Image Analysis and Recognition System for Japanese Family Registration

        A family registration data entry system with functions for
        automatic form layout analysis and character recognition
        was developed. The layout analysis module first detects characters
        and ruled lines by using information on the top and bottom boundaries
        of smeared
        black components. It then determines the layout and identifies each
        field in the layout by comparing predefined models with
        detected lines. Character strings in the fields are recognized
        and matched with a dictionary to check whether a sequence is
        plausible as a Japanese word or not. The text data are registered
        in a database after they have been examined by an operator
        and keywords have been extracted.
        This system was actually used for the initial entry of typed
        family registration forms in Tokyo's Toshima Ward, which contributed
        to establish the first computerized family registration system
        in Japan.

By: Tomio AMANO, Kazuharu TOYOKAWA, Takashi MANO, and Shuhji TORIYAMA

Published in: RT0142 in 1996

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .