ClusterScope: An Information Retrieval and Cluster Analysis Tool for Massive Databases

We present ClusterScope, a novel system for probing the contents of massive databases
through information retrieval and cluster identification. This system is novel in that it can
be used to identify major as well as minor clusters and allows for overlap of the clusters.
Its search engine is capable of processing hundreds of thousand of heterogeneously
formatted documents in real time. A convenient feature of the GUI is a service to
recommend 3 mutually perpendicular subspace coordinate axes in attribute space onto
which document vectors can be projected and displayed for view. These views may help
users understand relationships between a query and documents in the database.

By: Mei Kobayashi and Masaki Aono

Published in: EMAC 2002(Engineering Maths and Applications conference), Australia in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .