Parallel and Distributed Triangle Counting on Graph Streams

This paper studies the problem of approximating the number of triangles in a graph whose edges arrive as a stream. The number of triangles in a graph is an important metric in social network analysis, link classification and recommendation, and more. We present efficient algorithms for both shared-memory parallel (multicore-like) processing of a single stream where edges arrive in batches and processing of streams by multiple physically distributed processors. For the shared-memory setting, we give the first parallel cache-oblivious algorithm with good theoretical guarantees for accuracy; we also experimentally verify that the algorithm obtains substantial speedups and accurate estimates. For the distributed setting, we present a distributed algorithm with low message complexity, improving upon existing sketch-based algorithms in common settings.

By: A. Pavan, Kanat Tangwongan, Srikanta Tirthapura

Published in: RC25352 in 2013

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25352.pdf

Questions about this service can be mailed to reports@us.ibm.com .