Preliminary Results On the Discovery of Patterns of Amino Acids Common to Sequences of Leghemoglobins

        Pattern discovery (an NP-hard problem), in contrast to pattern matching (a problem solvable in polynomial time), is a challenging problem. Using Teiresias, a newly developed algorithm, we have explored (see [5]) a number of test cases in order to identify patterns of amino acids that appear across sequences (common motifs) as well as within individual sequences (internal repeats). In that work, only a small subset of the available results were presented in order to showcase the ability of the algorithm to: (a) validate the approach through the discovery of previously reported patterns; (b) demonstrate the capability to automatically identify highly selective patterns particular to the sequences under consideration; and (c) discover unidentified patterns in the well-studied example cases used. One of those example cases was the leghemoglobin family of proteins. In this report, we present the full range of results obtained for this particular protein family.

By: Isidore Rigoutsos, Aris Floratos, Christos Ouzounis (EMBL Cambridge)

Published in: RC20806 in 1997

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC20806final.ps

Questions about this service can be mailed to reports@us.ibm.com .