Prediction of Transcription Start Sites Based on Feature Selection Using AMOSA

Xi Wang, Sanghamitra Bandyopadhyay, Zhenyu Xuan, Xiaoyue Zhao, Michael Q. Zhang, Xuegong Zhang*

Bioinformatics Division, TNLIST and Dep. of Automation, Tsinghua Univ., Beijing 100084, China. zhangxg@tsinghua.edu.cn

Proc LSS Comput Syst Bioinform Conf. August, 2007. Vol. 6, p. 183-193. Full-Text PDF

*To whom correspondence should be addressed.


To understand the regulation of the gene expression, the identification of transcription start sites (TSSs) is a primary and important step. With the aim to improve the computational prediction accuracy, we focus on the most challenging task, i.e., to identify the TSSs within 50 bp in non-CpG related promoter regions. Due to the diversity of non-CpG related promoters, a large number of features are extracted. Effective feature selection can minimize the noise, improve the prediction accuracy, and also to discover biologically meaningful intrinsic properties. In this paper, a newly proposed multi-objective simulated annealing based optimization method, Archive Multi-Objective Simulated Annealing (AMOSA), is integrated with Linear Discriminant Analysis (LDA) to yield a combined feature selection and classification system. This system is found to be comparable to, often better than, several existing methods in terms of different quantitative performance measures.


[CSB2007 Conference Home Page]....[CSB2007 Online Proceedings]....[Life Sciences Society Home Page]