Finding Linear Motif Pairs from Protein Interaction Networks: A Probabilistic Approach

Henry C.M. Leung*, M.H. Siu, S.M. Yiu, Francis Y.L. Chin, Ken W.K. Sung

Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong. cmleung2@cs.hku.hk

Proc LSS Comput Syst Bioinform Conf. August, 2007. Vol. 6, p. 111-119. Full-Text PDF

*To whom correspondence should be addressed.


Finding motif pairs from a set of protein sequences based on the protein-protein interaction data is a challenging computational problem. Existing effective approaches usually rely on additional information such as some prior knowledge on protein groupings based on protein domains. In reality, this kind of knowledge is not always available. Novel approaches without using this knowledge is much desirable. Recently, Tan et al. [10] proposed such an approach. However, there are two problems with their approach. The scoring function (using χ2 testing) used in their approach is not adequate. Random motif pairs may have higher scores than the correct ones. Their approach is also not scalable. It may take days to process a set of 5000 protein sequences with about 20,000 interactions. In this paper, our contribution is two-fold. We first introduce a new scoring method, which is shown to be more accurate than the χ-score used in [10]. Then, we present two efficient algorithms, one exact algorithm and a heuristic version of it, to solve the problem of finding motif pairs. Based on experiments on real datasets, we show that our algorithms are efficient and can accurately locate the motif pairs. We have also evaluated the sensitivity and efficiency of our heuristics algorithm using simulated datasets, the results show that the algorithm is very efficient with reasonably high sensitivity.


[CSB2007 Conference Home Page]....[CSB2007 Online Proceedings]....[Life Sciences Society Home Page]