Visualizing Document Classification: A Search Aid for the Digital Library

The recent explosion of the Internet and the World Wide Web has made digital libraries popular. Easy access to a digital library is provided by commercially available Web browsers, which provide a user-friendly interface. To retrieve documents of interest, the user is provided with a search interface which may only consist of one input field and one push button. Most users type in a single keyword, click the button, and hope for the best. The result of a query using this kind of search interface can consist of a large unordered set of documents, or a ranked list of documents based on the frequency of the keywords. Both lists can contain articles unrelated to the user's inquiry unless a sophisticated search was performed and the user knows exactly what to look for. More sophisticated algorithms for ranking the search results according to how well they meet the users’ needs as expressed in the search input may help. However, what is desperately needed are software tools that can analyze the search result and manipulate large hierarchies of data graphically.

In this paper, we describe the design of a language-independent document classification system being developed to help users of the Florida Center for Library Automation analyze search query results. Easy access through the Web is provided, as well as a graphical user interface to display the classification results. We also describe the use of this system to retrieve and analyze sets of documents from public Web sites.

By: Yew-Huey Liu, Paul Dantzig, Martin Sachs, James T. Corey, Mark T. Hinnebusch, Marc Damashek, Jonathan Cohen

Published in: American Society for Information Science. Journal, volume 51, (no 3), pages 216-27 in 2000

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .