A Framework for Exploration of News Corpora by Actor Evolution and Interaction

We present a general framework for modeling and exploration of news corpus. The natural way to model a news corpus is as a directed graph where stories are linked to one another through a variety of relationships. We formalize this notion by viewing each news story as a set of actors, and by viewing links between stories as transformations these actors go through. We propose and model a simple and comprehensive set of transformations: create, merge, split, continue, and cease. These transformations capture evolution of a single actor as well as interactions among multiple actors. We present metrics to assign a score to each discovered transformation. These scores quantify the importance of individual events and aid in ranking the transformations. We show how ranking helps us infer important relationships between actors and stories in a corpus. Next, the derived transformations and associated ranking is used to generate a news graph. To handle the large size of the graph we propose summarization scheme which again leverages the derived transformations. Finally, we propose a interface which aid user to explore the corpus in a interactive fashion and finds the information of interest in an iterative manner. We demonstrate the effectiveness of our notions by experimenting on large news corpora.

By: Rohan Choudhary, Sameep Mehta, Amitabha Bagchi and Rahul Balakrishnan

Published in: RI07004 in 2007


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .