MindMap: Utilizing Multiple Taxonomies and Visualizatoin to Understand a Document Collection

We present a novel system and methodology for browsing and exploring topics and concepts within a document collection. The process begins with the generation of multiple taxonomies from the document collections, each having a unique theme. These taxonomies then become an integral tool in the exploration of the document collection.It is assumed that the user of our system may have only a vague notion of exactly what they are attempting to understand, and would like to explore related topics and concepts rather than simply being given a set of documents. For this purpose, we have developed the MindMap interface to the document collection. Starting from an initial keyword query, the MindMap interface helps the user to explore the concept space by first presenting the user with related terms and high level topics in a radial graph. After refining the query by selecting any related terms, one of the related high level concepts can be selected for further investigation. The MindMap uses a novel binary tree interface to explore the composition of a concept based on the presence or absence of terms. From the binary tree a concept can be further explored and visualized. Individual documents are presented as spatial coordinates where distance between points relates to document similarity. As the user browses this spatial representation, text is presented from the document that is most relevant to the user’s initial query. Individual points can be selected to pull up the relevant paragraphs from the document with the keywords highlighted. Finally, selected documents are displayed and the user is allowed to further interact and investigate.

By: W. Spangler, J. T. Kreulen, J. T. Lessler

Published in: RJ10211 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rj10211.pdf

Questions about this service can be mailed to reports@us.ibm.com .