CSB2008 Estimating support for protein-protein interaction data with applications to function prediction

Estimating support for protein-protein interaction data with applications to function prediction

Erliang Zeng, Chris Ding, Giri Narasimhan*, Stephen Holbrook

Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA. giri@cs.fiu.edu

Proc LSS Comput Syst Bioinform Conf. August, 2008. Vol. 7, p. 73-84. Full-Text PDF

*To whom correspondence should be addressed.


Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true proteinprotein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.


[ CSB2008 Conference Home Page ] .... [ CSB2008 Online Proceedings ] .... [ Life Sciences Society Home Page ]