Clustering Algorithms for Content-Based Publication-Subscription Systems

We consider efficient communication schemes based on both network-supported and application-level multicast techniques for content-based publication-subscription systems We show that the communication costs depend heavily on the network con&urations, distribution of publication and subsctiptions. We propose to use clustering algorithms to form multicast groups. We devise new algorithms and adapt partitional data clustering algorithms for these content-basedpublication-subscriprion systems. These algorithms can be used to determine multicast groups with
as much commonality as possible, based on the totality of subscribers’ interests. These algotithms perform well in the context of highly heterogeneous subscriptions, and they also scale well. An efficiency of 60% to 80% with respect to the
ideal solution can be achieved with a small number of multicast groups (less than 100 groups in our experiments). Some of these same concepts can be applied to match publications to subscribers in real-time, and also to determine dynamically
whether to unicast, multicast or broadcast information about the events over the network to the matchedsubscribers. We demonstrate the quality of our algorithms via a set of simulation experiments.

By: Zhen Liu, Joel L. Wolf, Li Zhang, Philip S. Yu, Anton Riabov

Published in: RC22404 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22404.pdf

Questions about this service can be mailed to reports@us.ibm.com .