Topical Document Clustering

We propose a novel document clustering method, which at a high level, seeks to uncover the topics underlying documents and to generate clusters accordingly. The process is driven by topical phrases. We rely on a subspace projection technique to find topical phrases and to measure how well those phrases describe the documents. Our evaluation on manually topic-labeled documents shows that the proposed method significantly outperforms all the tested methods. A useful side product of this method is a set of 'phrase trees' , which can serve as cluster summaries when presented to users.

By: Rie Kubota Ando

Published in: RC23023 in 2003


