"Localize": An Accurate Method for Predicting a Protein's Sub-cellular Location

The computational prediction of a protein’s sub-cellular location directly from the amino acid sequence is a well-known problem in bioinformatics. Together with structural and functional protein annotation methods, it is a valuable tool in high-throughput sequencing projects. In this work, we introduce a new method for the prediction of a protein’s sub-cellular location that is pattern-based and relies on the analysis of the corresponding amino acid sequence. Our method uses a training set of amino acid sequences from which it generates both fixed- and variable-length amino acid patterns that it then uses to place unclassified proteins into one of twelve possible sub-cellular locations. Through a series of experiments, we demonstrate that the new method can achieve substantial improvements in average sub-cellular location accuracy and total accuracy over previously reported approaches. An implementation of the described method is available at: http://cbcsrv.watson.ibm.com/localize.html.

By: Aristotelis Tsirigos; Stanislav Polonsky; Kevin C. Miranda; Isidore Rigoutsos

Published in: RC24549 in 2008


