Redescriptions: Structure Theory and Algorithms

We present a new approach to mining redescriptions -- patterns that identify subsets of data that afford multiple definitions. Redescription mining finds important applications in descriptor-rich datasets, such as in bioinformatics. The key contributions of this paper are (i) identifying the existence of a dichotomy law that the redescriptions follow (ii) definition of a notion of irredundant redescriptions underyling a dataset, (iii) an output-sensitive algorithm to mine the irredundant set of redescriptions in a specific form, both exact and approximate, (iv) identifying an important connection between biclusters and redescriptions, using which we can build a redescription mining algorithm around a biclustering algorithm.

By: Laxmi Parida; Naren Ramakrishna

Published in: RC23565 in 2005

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23565.pdf

Questions about this service can be mailed to reports@us.ibm.com .