Generating and Browsing Multiple Taxonomies over a Document Collection

In this paper we present a novel system and methodology for generating and then browsing multiple taxonomies over a document collection. eClassifier is used to generate multiple taxonomies each of which bring out a unique theme or relationship within the documents. These taxonomies then become a useful input for exploration of the document collection using our MindMap tool. Taxonomies are generated using a broad set of capabilities, including meta data, keyword queries and automated clustering techniques that serve as a seed taxonomy. The taxonomy editor provides powerful tools to visualize and edit each taxonomy to make it reflective of the desired theme. Cluster validation tools allow the editor to verify that documents received in the future can be automatically classified into each taxonomy with sufficiently high accuracy.

In general those seeking knowledge from a document collection may have only a vague notion of exactly what they are attempting to understand, and would like to explore related topics and concepts rather than simply being given a set of documents. For this purpose, we have developed an interface utilizing multiple taxonomies and the ability to interact with a document collection. We call this interface MindMap. Starting from an initial keyword query, the MindMap interface helps the user to explore the concept space by first presenting the user with related terms and high level topics in a radial graph. After refining the query by selecting any related terms, one of the related high level concepts can be selected for further investigation. The MindMap uses a novel binary tree interface to explore the composition of a concept based on the presence or absence of terms.

By: Scott Spangler, Jeffrey T. Kreulen, Justin Lessler

Published in: Journal of Management Information Systems, volume 19, (no 4), pages 191-212 in 2003

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .