CSB2009 Improved small molecule activity determination via centroid nearest neighbors classification

Improved small molecule activity determination via centroid nearest neighbors classification

Phuong Dao*, Farhad Hormozdiari, Hossein Jowhari, S. C. Sahinalp, Kendall Byler, Artem Cherkasov, Zehra Cataltepe

School of Computing Science, SFU, Burnaby, BC, Canada. pdao@cs.sfu.edu

Proc LSS Comput Syst Bioinform Conf. August, 2009. Vol. 8, p. 251-262. Full-Text PDF

*To whom correspondence should be addressed.


Small molecules which alter biological processes or disease states are of significant interest. In-silico drug discovery commonly uses measures of structural similarity for identifying the right small molecule for a given task. Because explicit structure similarity determination is a very difficult task, modern chemoinformatics solutions typically use quantitative structure-activity relationships (QSAR), in the context of which small molecules are described with real valued descriptor arrays. In this paper we show how to identify the bioactivity exhibited by compounds of interest through Centroid based Nearest Neighbor (CBNN) classifiers, in which, on a given training set, the best representative compounds of each specific bioactivity need to be selected. For that purpose we introduce the Combinatorial Centroid Nearest Neighbor (CCNN) method which determines the representative compounds in a way that would yield no classification errors on the training set. On a number of data sets CCNN method was applied, we observed that CCNN provides the highest accuracy over the data sets of three different bioactivities among all classifiers we tested.


[ CSB2009 Conference Home Page ] .... [ CSB2009 Online Proceedings ] .... [ Life Sciences Society Home Page ]