Prosciutto: A Tool for Analyzing the Contents of Massive Databases through Visual Display of 3-D Slices of Document Space

We present Prosciutto, a novel system for visualizing the contents of massive databases. The system has several notable features, including: similarity search based on vector space modeling and a service to recommend three mutually perpendicular subspace coordinate axes in attribute space onto which document vectors can be projected and displayed for view to help users understand relationships between a query and database documents. Our system can also be used to find and understand clusters. For example: major clusters and their sub-clusters can be easily distinguished from minor clusters when viewed in the 3-D subspace; distances between documents and distances between clusters can be clearly seen; and pop-up title and keyword windows enable users to quickly find topics covered by documents in specific clusters. Prosciutto can be used to analyze databases consisting of documents of heterogeneous formats as long as they can be modeled as vectors. Its similarity search component can process hundreds of thousands of documents in dynamic databases.

By: Masaki Aono and Mei Kobayashi

Published in: IPSJ Technical Report(DPS), volume , (no ), pages in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .