Text Analysis as Formal Inference for the Purposes of Uniform Tracing and Explanation Generation

The high-level research goal of the Knowledge Associates for Novel Intelligence (KANI) project is to use a variety of advanced techniques to transform unstructured information, in the form of natural language text documents, into actionable knowledge. That is, knowledge that is sufficiently structured and assigned precise semantics so that classic automating reasoning techniques maybe applied to generate and test intelligence analyst’s hypotheses. The project brings together experts in text extraction, knowledge representation and reasoning, and advanced user interfaces from Stanford, IBM Research, and Battelle.

The practical realization of this goal is an interactive system capable of assisting the intelligence analyst in quickly discovering, extracting, filtering and synthesizing complex unstructured data to arrive at a manageable task-relevant working knowledge-base (WKB). Through the application of automated reasoning techniques, KANI will assist the user in generating and testing hypotheses over the assertions and rules in the working knowledge-base.

A key element of KANI’s interaction with the user is the ability of KANI to explain its final and intermediate inferences. KANI must leave open the possibility that any inference made in the process of supporting or refuting a hypothesis may be brought under question by the analyst. To do this, KANI must provide a formal mechanism for tracing its reasoning from the top-most conclusions back through the lowest-level inferences.

This report proposes a key innovation for UIMA to explain text analysis as a series of high-level inference steps.

By: David Ferrucci

Published in: RC23372 in 2004

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23372.pdf

Questions about this service can be mailed to reports@us.ibm.com .