Identifying Bundles of Product Options using Mutual Information Clustering

Mass-produced goods tend to be highly standardized in order to maximize manufacturing efficiencies. Some high-value goods with limited production quantities remain much less standardized and each sale can be configured to meet the specific requirements of the customer. In this work we suggest a novel methodology to reduce the number of options for complex product configurations by identifying meaningful sets of options that exhibit strong empirical dependencies in previous customer orders. Our approach explores different measures from statistics and information theory to capture the degree of interdependence between the choices for any pair of product components. We use hierarchical clustering to identify meaningful sets of components that can be combined to decrease the number of unique product specifications and increase production standardization. The focus of our analysis is on the influence of different similarity measure - including chi-squared statistics and versions of mutual information - on the ability of the clustering to find meaningful clusters.

By: Claudia Perlich; Saharon Rosset

Published in: RC24168 in 2007

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24168.pdf

Questions about this service can be mailed to reports@us.ibm.com .