Selection of Putative Cis-regulatory Motifs Through Regional and Global Conservation
Youlian Pan, Brandon Smith, Hung Fang, Fazel A. Famili, Marianna Sikorska, and Roy Walker
Institute for Information Technology, National Research Council Canada
Cis-regulatory motifs are the binding sites of transcription factors. They provide information crucial to
the understanding of regulatory mechanisms of gene expression. These motifs are often overrepresented in
promoters and exhibit biases in sub-promoter regions (SPRs). Many probabilistic algorithms, such as hidden
Markov models and Gibbs sampling, have been used to predict such motifs. However, they tend to generate a substantial
number of false positives. The challenge we face today is to find the needles—true regulatory motifs—in a
haystack of false positives. This poster presents a novel algorithm, MotifFilter, that performs comparative analysis
of motif frequencies between random genomic and promoter regions, and among SPRs. This approach is based on
nucleotide probability distributions in various regions of a genome. Representation indices for putative motifs,
found by methods such as hidden Markov models or Gibbs sampling, are generated based on these nucleotide probability
distributions and motif frequencies in each SPR. By comparing the representation indices of different SPRs and
genomic sequences, a substantial number of false positives may be filtered out, while many valuable putative
motifs are retained. MotifFilter is a relatively fast algorithm [O(mn), m=motif_length, n=sequence_length]
compared to other probabilistic algorithms, and has very little computational overhead. This approach has been
successfully applied to a genome-wide survey of putative cAMP-response elements (CREs) in which 20 out of 144
putative CRE motifs found by a profile hidden Markov model were retained. Fourteen of these were confirmed by
either a TransFac consensus or by the literature.
|