Obtaining Formal Knowledge from Informal Text Analysis

Populating formal knowledge-bases from natural-language text is a long-standing objective in computer science. Recent advancements in both ontology research and information extraction research are making this objective increasingly obtainable. However, there are still serious obstacles to performing automated reasoning over the contents of text documents. This paper focuses on one of those obstacles: differences between the formal ontologies used by reasoning systems and the informal ontologies used by extraction systems. We describe a framework for automating translation from extracted information to formal knowledge, and we describe a complex, implemented system that uses this framework. We also describe results from this system applied to a moderately large (approximately 75 MB) text corpus.

By: J. William Murdock; Christopher Welty

Published in: RC23961 in 2006


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .