Matrix Computations for Detecting and Visualizing Outlier Clusters

We propose two novel algorithms for detecting major clusters
(i.e., clusters that are comprised of more than 4% of the documents in a database)
and outlier clusters or outliers (i.e., clusters that are comprised of 3% to 4% of the
documents in a database). And we introduce a visualization system to enable users
to view, manipulate and understand output from our algorithms through a simple
3-dimensional graphical user interface. Some fairly successful techniques have
been developed to identify major clusters, however these techniques often fail to
identify outliers. Outliers in very large databases can represent valuable information,
for example: unusual spending patterns due to fraudulent use of credit cards,
customers who have a high probability of defaulting on loan payments, and small but
emerging trends in customer claim and satisfaction. Our two algorithms are based
on information retrieval algorithms which use vector space modeling: the latent
semantic indexing (LSI) algorithm of Deerwester et al. and the covariance matrix
analysis (COV) algorithm of Kobayashi et al..

By: Mei Kobayashi, Masaki Aono, Hironori Takeuchi, Hikaru Samukawa

Published in: RT0434 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0434.pdf

Questions about this service can be mailed to reports@us.ibm.com .