Data Mining and the IBM Official 1996 Olympic Web Site

Data mining of
Web server logs offer the potential for deep analysis of visitor activity at a Web site, activity
usually unknown before hand. For instance, the discovery of association rules can show how visitor
s make up groups with common interests, while a sequential patterns algorithms can uncover the most
trodden paths at a site. In fact, the application of a variety of data mining techniques can extr
act information which no technique applied alone could generate. This paper describes the applicat
ion of a range of statistical and data mining techniques to the 1996 Olympic Web Site access logs.
This paper also demonstrates and emphasizes the benefits of classifying documents hosted at a Web
site. Classification provides a general, high-level description of the document collection, a desc
ription independent of individual documents. By augmenting documents with multiple descriptive att
ributes, the site can be analyzed from several complimentary perspectives. The results presented i
n the paper are accompanied with suggest applications to other Internet services, such as merchandi
se servers, personalized ages, and on-line databases.

By: Sara Elo-Dean (IBM Southbury), Marisa Viveros

Published in: RC20714 in 1997


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

Questions about this service can be mailed to .