Automatic Allograph Categorization Based on Stroke Clustering for On-Line Handwritten Japanese Character Recognition

In order to build a recognition dictionary so that the dictionary includes various writing styles, an automatic method for categorizing writing styles of characters (allographs) is proposed.
In the first step of allograph categorization, we categorize handwritten strokes in training data by using a clustering algorithm that is also proposed in this paper. After execution of the algorithm, the centroid of a cluster is referred to as the prototype stroke. By using prototype strokes, we categorize handwritten characters to obtain allographs. In this approach, allographs share common prototype strokes. This allows us to reduce the dictionary size and computational cost of recognition. Furthermore, we can compare two allographs to determine where the stroke order is different and which strokes are connected. In the stroke clustering, the number of clusters are automatically determined on the basis of one parameter $\Delta$ that we give before the clustering procedure. The parameter $\Delta$ is the maximum error in a stroke cluster. Allograph dictionaries for 2321 categories were experimentally made by using handwritten characters produced by 121 writers. Recognition experiment by using these dictionaries were carried out, so that the relations between the parameter $\Delta$, the number of prototype strokes, the number of allographs, and recognition accuracy were obtained.

By: Kazutaka Yamasaki

Published in: RT0290 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0290.pdf

Questions about this service can be mailed to reports@us.ibm.com .