Research Issues in Supporting Data Intensive Applications within an Exascale System

Analyzing large graphs are crucial to a variety of applications domains, like personalized recommendations in social networks, search engines, communication networks, computational biology, etc. In these domains, there is a need to process aggregation queries over large graphs. Existing approaches for aggregation are not suitable for large graphs, as they involve multi-way relational over gigantic tables or repeated multiplication of large matrices.

In this report, we consider the top-K aggregation queries that involve identifying top-K nodes with highest aggregate values over their h-hop neighbors. We propose algorithms for processing such queries over large graphs in a shared nothing environment. We propose a hybrid algorithm that minimizes network loads is shuffling data across the processing nodes. The algorithm partitions the graph across the processing nodes, and uses Floyd-Warshall algorithm within each nodes. The nodes shuffles updates among themselves in iterative phases; such incremental iterative processing is similar to route discover in a large network. The algorithm needs only a few iterations to converge to an equilibrium state.

By: Abhirup Chakraborty, Dilma Da Silva

Published in: RC25302 in 2012

rc25302.pdf

Questions about this service can be mailed to reports@us.ibm.com .