rna22: A Unified Computational Framework for Discovering miRNA Precursors, Localizing Mature miRNAs, Identifying 3’ UTR Target-islands, and Determining the Targets of Mature-miRNAs

Mature microRNAs are short (21-22nt long) RNAs that are enzymatically excised from endogenously-encoded, longer precursors and have been shown to hybridize to mRNAs transcripts causing the transcripts’ downregulation through the RNA interference mechanism. Because of the importance of RNAi in the regulation of cell processes, attempts to answer the questions of cardinality and location of the miRNA precursors which are encoded by a given genome have been at the center of scientific research for several years already. The arguably open question of cardinality not withstanding, many miRNA precursors have already been reported in the literature for several genomes, and additional are continuously sought. A component that is equally important for shedding more light on the details of the RNAi process is that of determining the cardinality and location of the targets of these mature miRNAs as well as the identity of the mature miRNA that will hybridize to a given target. Generally assumed to be located in the 3’ UTRs of mRNA transcripts, these miRNA targets have proven to be much more elusive, and, despite great amounts of work by many scientists around the world, very few of them have been validated experimentally to date. Due to the high cost (in materials and time) of the experimental approach, computational methods are becoming increasingly important as they can help focus the experimentalist’s attention and effort while maximizing the rate of experimental success. All of the computational methods that are available in the literature have generally treated the problems of miRNA precursor discovery, mature miRNA localization, miRNA-target-island determination and mature-miRNA/miRNA-target identification as separate tasks with varying degrees of reported success. In this paper, we present a method that simultaneously tackles the four problems of miRNA precursor discovery, mature miRNA localization, miRNA-target-island identification and mature-miRNA/miRNA-target determination, in a single, uniform framework. To the best of our knowledge this is the first method of its kind that addresses all these questions in a unified way. In contrast to some of the previously reported techniques that were developed and focused on specific genomes, our method, rna22,§ is genome-independent and applies equally-well to genomes spanning the spectrum from viruses to mammals. Key to our method is the use of a greatly redundant scheme for representing locally conserved signatures that are identified by processing the sequences of known precursors and mature miRNAs using an exhaustive pattern discovery technique. The use of local signatures liberates us from the limitations associated with seeking precursor-wide conservation across the genomes of related species while potentially permitting the identification of precursors and 3’ UTR target-islands that are potentially mosaic-like structures composed of known elemental blocks. Using a very extensive computational analysis, we examine the capabilities of our method and demonstrate that it a) identifies essentially all currently known miRNA precursors, b) very accurately locates the mature miRNAs in all known precursors, c) correctly predicts most of the 3’ UTR regions that have been shown to be targeted by known mature miRNAs, and, d) correctly predicts a large percentage of the miRNA/mRNAtarget pairs that have appeared in the literature. Additionally, our method has the very desirable characteristic of simultaneously exhibiting substantially high sensitivity and specificity values. We have used our method to analyze several genomes and to obtain revised estimates for the number of endogenously coded miRNA precursors as well as for the number of 3’ UTR islands that will act as targets of one or more mature miRNAs: summarily, our analysis suggests that both of these numbers are likely to be substantially higher than initially believed. Taken together, our analysis suggests that there exist a very extensive combinatorial mechanism for carrying out post-transcriptional gene regulation within the cell and that the RNA interference-based regulation of cellular processes is a very pronounced and wide-ranging mechanism.

By: Isidore Rigoutsos; Kevin Miranda; Tien Huynh

Published in: RC23530 in 2005

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc23530.pdf

Questions about this service can be mailed to reports@us.ibm.com .