Preliminary Results On the Discovery of Patterns of Amino Acids Common to Sequences of Leghemoglobins

        Pattern discovery (an NP-hard problem), in contrast to pattern matching (a problem solvable in polynomial time), is a challenging problem. Using Teiresias, a newly developed algorithm, we have explored (see [5]) a number of test cases in order to identify patterns of amino acids that appear across sequences (common motifs) as well as within individual sequences (internal repeats). In that work, only a small subset of the available results were presented in order to showcase the ability of the algorithm to: (a) validate the approach through the discovery of previously reported patterns; (b) demonstrate the capability to automatically identify highly selective patterns particular to the sequences under consideration; and (c) discover unidentified patterns in the well-studied example cases used. One of those example cases was the leghemoglobin family of proteins. In this report, we present the full range of results obtained for this particular protein family.

By: Isidore Rigoutsos, Aris Floratos, Christos Ouzounis (EMBL Cambridge)

Published in: RC20806 in 1997


