Tandem Repeat Detection using Pattern Discovery with Applications to the Identification of Yeast Satellites

        A rigorous definition of Tandem Repeats (TR) is proposed, and a sensitive method is presented for TR detection in genomic data. The heart of our method lays on organizing the information provided by a pattern discovery tool, TEIRESIAS. The latter is a model-less algorithm, and thus our method inherits the feature of not relying on a predetermined model. Using this technique, we made a systematic study of the TR in the yeast S. Cerevisiae. We studied the distribution of periods and copy-numbers as well as the proportion of TR in coding vs. non-coding regions. With respect to the later, we found that for periods greater than 10, 60% of the TR lay on coding regions. We also discuss the phenomenon of nesting of TR (which occurs when a TR is contained within another TR locus). Finally we study patterns that are common to the yeast TR.

By: Gustavo Stolovitzky, Y. Gao, A. Floratos, I. Rigoutsos

Published in: RC21508 in 1999

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

IR.ps

Questions about this service can be mailed to reports@us.ibm.com .