Data Tagging Architecture for System Monitoring in Dynamic Environments

Large enterprise systems need continuous monitoring at infrastructure, application and business levels to detect and prevent problem situations. Traditionally, automated monitoring solutions are programmed once at setup based on a set of well-defined monitoring objectives and handed over to the operations team. Such solutions have underlying data models that are often complex and semantically rich but in stable environments, this complexity is generally hidden from the operations team, who only need to make minor configuration changes (e.g. setting thresholds) as and when required. However, the situation is now rapidly changing with enterprise data centers being subject to continuous transformations as new software, hardware and process components get deployed or updated. This puts an immense burden on monitoring activity because not only thousands of different parameters need to get monitored but the addition and modification of service level objectives (SLOs) may happen continuously.

We describe a monitoring system architecture which simplifies the task of authoring and managing SLOs in such dynamic and heterogeneous environments. At the heart of our approach is a lightweight and extensible data model that is derived from more complex configuration models, so as to only expose data relevant for monitoring to the operations team. Simple string-tags derived from this model are then used to label SLOs and associated data streams. The approach localizes programming to the data-sensor layer and makes authoring simpler than the specification of objects in an alternate richer but complex object-oriented representation. We also describe a tag-driven real-time visualization tool that can organize data streams using their accompanying tags and ease user navigation through large volumes of monitoring data.

By: Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh

Published in: RI07008 in 2007

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RI07008.pdf

Questions about this service can be mailed to reports@us.ibm.com .