ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks

Outline social media networks (OSMNs) such as Twitter provide great opportunities for public engagement and event information dissemination. Event-related discussions occur in real time and at the worldwide scale. However, these discussions are in the form of short, unstructured messages and dynamically woven into daily chats and status updates. Compared with traditional news articles, the rich and diverse user-generated content raises unique new challenges for tracking and analyzing events. Effective and efficient event modeling is thus essential for real-time information-intensive OSMNs.

In this work, we propose ETree, an effective and efficient event modeling solution for social media network sites. Targeting the unique challenges of this problem, ETree consists of three key components: (1) an n-gram based content analysis technique for identifying core information blocks from a large number of short messages; (2) an incremental and hierarchical modeling technique for identifying and constructing event theme structures at different granularities; and (3) an enhanced temporal analysis technique for identifying inherent causalities between information blocks. Detailed evaluation results using 3.5 million tweets over a 5-month period demonstrate that ETree can efficiently and incrementally generate high-quality event structures and identify inherent causal relationships with high accuracy.

By: Hansu Gu, Xing Xie, Qin Lv, Yaoping Ruan, Li Shang

Published in: RC25202 in 2011

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25202.pdf

Questions about this service can be mailed to reports@us.ibm.com .