Interactive Methods for Taxonomy Editing and Validation

Today’s enterprise understands that improved utilization of its collective knowledge assets leads to improved business performance. The reality of proliferation of electronic information and pressures to produce more with fewer resources while performing increasingly complex tasks makes this a continuous challenge. To address this challenge and create value where there is currently chaos, enterprises are building knowledge repositories and structuring them in ways that are meaningful to their organization, business and processes. This structuring typically manifests itself in the form of one or more taxonomies. The taxonomies are meaningful hierarchical categorizations of documents into topics reflecting the natural relationships between the documents and their business objectives. Improving the quality of these taxonomies and reducing the overall cost required to create them is therefore an important area of research. Supervised and unsupervised text clustering are important technologies that comprise only a part of a complete solution. However, there exists a great need for the ability for a human to efficiently interact with a taxonomy during the editing and validation phase. We have developed a comprehensive approach to solving this problem, and implemented this approach in a software tool called eClassifier. eClassifier provides features to help the taxonomy editor understand and evaluate each category of a taxonomy and visualize the relationships between the categories. Multiple techniques allow the user to make changes at both the category and document level. Metrics then establish how well the resultant taxonomy can be modeled for future document classification. eClassifier enables the development of multiple taxonomies so that multiple relationships in the documents can be modeled. In this paper, we present a comprehensive set of viewing, editing and validation techniques we have implemented in the Lotus Discovery Server resulting in a significant reduction in the time required to create a quality taxonomy.

By: Scott Spangler, Jeffrey Kreulen

Published in: Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002). New York, , ACM. , 665-8 in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .