Evolutionary Clustering and Analysis of Heterogeneous Information Networks

In this paper, we study the problem of evolutionary clustering of multi-typed objects in a heterogeneous bibliographic network. The traditional methods of homogeneous clustering methods do not result in a good typed-clustering. The design of heterogeneous methods for clustering can help us better understand the evolution of each of the types apart from the evolution of the network as a whole. In fact, the problem of clustering and evolution diagnosis are closely related because of the ability of the clustering process to summarize the network and provide insights into the changes in the objects over time. We present such a tightly integrated method for clustering and evolution diagnosis of heterogeneous bibliographic information networks. We present an algorithm, ENetClus, which performs such an agglomerative evolutionary clustering which is able to show variations in the clusters over time with a temporal smoothness approach. Previous work on clustering networks is either based on homogeneous graphs with evolution, or it does not account for evolution in the process of clustering heterogeneous networks. This paper provides the first framework for evolution-sensitive clustering and diagnosis of heterogeneous information networks. The ENetClus algorithm generates consistent typed-clusterings across time, which can be used for further evolution diagnosis and insights. The framework of the algorithm is specifically designed in order to facilitate insights about the evolution process. We use this technique in order to provide novel insights about bibliographic information networks.

By: Manish Gupta; Charu Aggarwal; Jiawei Han; Yizhou Sun

Published in: RC25012 in 2010

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25012.pdf

Questions about this service can be mailed to reports@us.ibm.com .