A Framework for Forms Processing by Using Enhanced-Line-Shared-Adjacent Format

The objective of this paper is to introduce a novel framework for forms processing
that provides seamless processing of two different kinds of formats,
which are physical formats whose fields have rigidly defined positions and sizes and
topological formats in which variations in the positions and sizes of fields are acceptable
as long the topological relations between pairs of fields are preserved.
An line-shared-adjacent (LSA) cell relation and an LSA format are introduced to define
topological formats and then they are enhanced to describe physical information as
an enhanced LSA (e-LSA) and an e-LSA format.
The e-LSA format has good flexibility to define not only physical and topological formats
but also hybrid formats, on which our framework is based.
The format has characteristics of both physical and topological formats and
enables the framework to handle the two kinds of information seamlessly.
The framework consists of four modules: a format generator, a format converter,
a format class manager and a form processor and all processes for field detection
that our research focuses on are performed by them.
In this paper, their collaborative work are illustrated with some example,
which supports the effectiveness of our framework.

By: Y. Hirayama

Published in: RT0297 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0297.pdf

Questions about this service can be mailed to reports@us.ibm.com .