Within-cluster adaptive metric for clustering

The clustering problem concerns the discovery of homogeneous groups of data
according to a certain similarity measure. Clustering suffers from the curse of dimensionality. It is not meaningful to look for clusters in high dimensional spaces as the average density ofpoints anywhere in input space is likely to be low. As a consequence, distance functions that equally use all input features may be ineffective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction techniques. Our method associates to each cluster a weight vector, whose values give
information of the degree of relevance of features for each set in the partition. We formally prove that our algorithm converges, and experimentally demonstrate the gain in perfomance we achieve with our method.

By: Carlotta Domeniconi, D Gunopulos, Sheng Ma

Published in: RC22488 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22488.pdf

Questions about this service can be mailed to reports@us.ibm.com .