Bioinformatics for Microarrays

Microarrays (or biochips) is perhaps one of the most exciting developments in bioinformatics research. The emerging biochip technology has made it possible to simultaneously study expression (activity level) of thousands of genes or proteins in a single experiment in the laboratory. However, in order to extract relevant biological knowledge from the biochip experimental data, it is critical not only to analyze the experimental data, but also to cross-reference and correlate these large volumes of data with information available in external biological databases accessible online.

We describe a comprehensive system for knowledge management in bioinformatics called e2e in which data generated by the biochip experiments can be analyzed for emerging patterns among groups of genes with additional insights from related analyses like pathway scores, sequence similarity, literature text summarization, etc. To the biologist or biological applications, e2e exposes a common semantic view of inter-relationship among biological concepts in the form of an XML representation called eXpressML. Internally, e2e can use any data integration solution (like DiscoveryLink, Kleisli or natively XML-based) to retrieve data and return results corresponding to the semantic view. We have implemented an e2e prototype that demonstrates our framework by allowing a biologist to analyze her gene expression data in GEML or from a public site like Stanford, and discover knowledge through operations like querying on relevant annotated data represented in eXpressML using pathways data from KEGG, publication data from Medline and protein data from SWISS-PROT.

By: Sudeshna Adak, Vishal S Batra, Deo N Bhardwaj, P V Kamesam, Pankaj Kankar, Manish P Kurhekar, Biplav Srivastava

Published in: RI02016 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RI02016.pdf

Questions about this service can be mailed to reports@us.ibm.com .