Data Mining and the IBM Official 1996 Olympic Web Site

Data mining of
Web server logs offer the potential for deep analysis of visitor activity at a Web site, activity
usually unknown before hand. For instance, the discovery of association rules can show how visitor
s make up groups with common interests, while a sequential patterns algorithms can uncover the most
trodden paths at a site. In fact, the application of a variety of data mining techniques can extr
act information which no technique applied alone could generate. This paper describes the applicat
ion of a range of statistical and data mining techniques to the 1996 Olympic Web Site access logs.
This paper also demonstrates and emphasizes the benefits of classifying documents hosted at a Web
site. Classification provides a general, high-level description of the document collection, a desc
ription independent of individual documents. By augmenting documents with multiple descriptive att
ributes, the site can be analyzed from several complimentary perspectives. The results presented i
n the paper are accompanied with suggest applications to other Internet services, such as merchandi
se servers, personalized ages, and on-line databases.

By: Sara Elo-Dean (IBM Southbury), Marisa Viveros

Published in: RC20714 in 1997


