CSB2009 MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

Guanqun Shi*, Liqing Zhang, Tao Jiang

Department of Computer Science, University of California, Riverside, CA 92521, USA. jiang@cs.ucr.edu

Proc LSS Comput Syst Bioinform Conf. August, 2009. Vol. 8, p. 13-24. Full-Text PDF

*To whom correspondence should be addressed.


Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model). In this paper, we develop MSOAR 2.0, an improved system for ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with two well-known genome-scale ortholog assignment tools, the InParanoid program and the Ensembl ortholog database, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests. These experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for ortholog assignment.


[ CSB2009 Conference Home Page ] .... [ CSB2009 Online Proceedings ] .... [ Life Sciences Society Home Page ]