Traffic Profiling, Clustering and Classification for Commercial Web Sites

The problems of workload characterization, performance modeling, workload and performance fore-casting, and capacity planning are fundamental to the growth of Web services and applications. Previous studies have primarily focused on the complexity of Web traffic at the level of object-hits or page-views. In contrast, our study focuses on higher-level characteristics, and introduces techniques for profiling, clustering and classification of Web site traffic. In particular, we devise novel techniques for efficient
and automated extraction of Web traffic patterns from access logs, for efficient and automated clustering of such traffic patterns, and for efficient and automated classification of Web traffic based on the extraction and clustering of traffic templates. Our approach has been applied to more than 25 existing commercial Web sites. Moreover, it has been demonstrated that our approaches can accurately capture
and characterize the complexities of Web traffic in commercial Web sites. These methods provide new solutions to solve the challenging problems such as workload and performance prediction, and short-term and long-term capacity planning.

By: Zhen Liu, Mark S. Squillante, Cathy Honghui Xia, Shun-Zheng Yu, Li Zhang

Published in: RC22567 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22567.pdf

Questions about this service can be mailed to reports@us.ibm.com .