Contextual Analysis of User Interests in Social Media Sites - An Exploration with Micro-blogs

Since their inception, social media sites have evolved from
instant messaging and online networking to diverse, multi-
faceted entities that encompass the entire personality of an
individual. Recent advances in technology around mobile -
based access to social networking platforms and facilities to
update status information in real time (e.g. in Facebook)
have further allowed an individual's online presence to be
as ephemeral and dynamic in nature as her very thoughts
and interests. In this context, micro-blogging has been
widely adopted by users as an effective means to capture and
disseminate their thoughts and actions to a larger audience
on a daily basis. Interestingly, the daily chatters of a user
obtained from her micro blogs over a unique information
source to analyze and interpret her context in real-time -
i.e. interests, intentions, and activities. Rich contextual
information about users allow social networking players to
develop value-added applications and associated business
models to monetize the same.
In this paper, we gather data from the public timeline
of Twitter (one of the most popular micro-blogging sites)
spanning across ten worldwide cities over a period of four
weeks. We use this dataset to (a) explore how users
express interests in real-time through micro blogs, and (b)
understand how unstructured text mining techniques can
be applied to interpret the real-time context of a user based
on her tweets. Our findings provide evidence that social
media sites like Twitter constitute a promising source for
extracting user context that can be exploited by a multitude
of social networking applications.

By: Nilanjan Banerjee, Dipanjan Chakraborty, Koustuv Dasgupta, Anupam Joshi, Sameer Madan,Sumit Mittal, Seema Nagar, Angshu Rai

Published in: RI09012 in 2009

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RI09012.pdf

Questions about this service can be mailed to reports@us.ibm.com .