CSB2009 Nonnegative least square - a new look into SAGE data

Nonnegative least square - a new look into SAGE data

Erliang Zeng*, Mitsunori Ogihara

Department of Computer Science Center for Computational Science University of Miami Miami, FL 33146, USA. zeng@cs.miami.edu

Proc LSS Comput Syst Bioinform Conf. August, 2009. Vol. 8, p. 151-161. Full-Text PDF

*To whom correspondence should be addressed.


Serial Analysis of Gene Expression (SAGE) is a technology for quantifying gene expression using sequencing of short stretches (tags) of DNA that are produced by reverse transcription and enzymatic restriction. A major issue in SAGE data analysis is ambiguity of tags, i.e., single tags matching multiple genes and single genes matching multiple tags. The ambiguity produces groups of interrelated quantitative constraints among tag counts and gene expression values. We propose to solve the web of relations between tags and genes using nonnegative least square (NNLS) method. In this paper we present a fast algorithm to do this task. The effectiveness of the method is confirmed by examining a published data that involves SAGE and a method called GLGI. The method is then applied to a SAGE data for a human neurodegenerative disease. The experimental results show that more reliable gene expression can be inferred from SAGE tags using our method, suggesting that our method is powerful for exploring gene expression patterns and identifying candidate genes from SAGE data that potentially contribute to the susceptibility of human complex disease.


[ CSB2009 Conference Home Page ] .... [ CSB2009 Online Proceedings ] .... [ Life Sciences Society Home Page ]