A Human-Computer Cooperative System For Effective High-Dimensional Clustering

Clustering problems are well known in the database literature for their use in numerous applications such as customer segmentation, classification and trend analysis. High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of
the points. Recent research results have shown that it is often not meaningful to find clusters in high dimensional data with traditional measures of proximity: since such measures themselves become questionable in high dimensionality. Therefore, techniques have recently been proposed to find clusters in hidden subspaces of the data. However, since the behavior of the data may vary considerably in different subspaces, it is often difficult to define the notion of a cluster
with the use of simple mathematical fonnalizations. In fact, the meaningfulness and definition of a cluster is best characterized with the use of human intuition. In this paper, we propose a system which performs high dimensional clustering by effective cooperation between the human and the computer. The complex task of cluster creation is accomplished by a combination of human intuition and the computational support provided by the computer. The result is a system which leverages the best abilities of both the human and the computer in order to create very meaningful sets of clusters in high dimensionality.

By: Charu C. Aggarwal

Published in: KDD 2001. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. , New York, ACM, p.221-6 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22103.pdf

Questions about this service can be mailed to reports@us.ibm.com .