Exploring Genomic Context Patterns for Rhodobacter sphaeroides in the HERBE Knowledge Discovery Environment
Heidi J. Sofia, Abigail L. Corrigan, Kyle R. Klicker, George Chin, and Eric G. Stephan
Pacific Northwest National Lab
The complexity and scale of biological data makes sophisticated information strategies increasingly essential
for investigators. We have built a powerful knowledge discovery resource using genomic context methods,
interactive visualization strategies, and advanced computational environment technologies. The Heuristic
Entity Relationship Building Environment (HERBE) is a research platform for advanced database technologies
that fuses data management solutions with knowledge management components to support the dynamic capture of
concepts and observations as biologists explore large-scale data. Visualization strategies such as Similarity
Box increase the ability of biologists to interact with large-scale computational results and evaluate
relationships based on natural reasoning processes. We have applied these knowledge methods in the exploration
of complex genome relationships using genomic context data mining in an effort to reveal the biological
characteristics of particular organisms such as Rhodobacter sphaeroides. For example, conserved gene neighbor
patterns are gene linkages in the chromosome that are conserved across species. Using HERBE, we extracted the
complete set of gene neighbor patterns for Rhodobacter sphaeroides by mapping data structures for chromosomal
contiguity against sequence similarity results. We then organized these gene neighbor patterns in a Similarity
Box visualization to enable biologists to explore the results. Phylogenetic profiles for the gene neighbor
patterns were exported and clustered to provide a map of neighbor patterns organized by potential significance.
This view links gene neighbor sets into larger groups of proteins which may be regulons or pathways.
RepeatAssembler: A Package for Annotation of Full-length Repetitive DNA Sequences in Fungal Genomes
Farman, Mark L. (1); Gilkerson, Joshua W. (2); Jaromczyk, Jerzy W. (2); and Staben, Chuck (3)
(1) Department of Plant Pathology, University of Kentucky; (2) Department of Computer Science,
University of Kentucky; (3) Department of Biology, University of Kentucky
RepeatAssembler is an attempt to automate the identification of full-length consensus sequences for
repetitive DNA fragments. Although there are packages with similar objectives, e.g. RECON
(http://www.genetics.wustl.edu/eddy/recon/) , none fully meets the needs of researchers in comparative
genomics. Joining short repeating elements into full-length repeats is particularly intricate and properly
handling this task is the main goal and the strength of our solution. RepeatAssembler analyzes as few
sequences as possible to screen out all repetitive sequences and builds a set of non-redundant repeats
and reports the full-length consensus. The steps are: 1. Divide: the genome is divided into segments.
2. Evaluate: using the BLAST data, each segment is evaluated with respect to the likelihood that it is
part of a repeat. 3. Add and Prune: iteratively, the best segments are chosen and added to the set of
repeats, possibly extending an existing repeat. Each time a segment is chosen and other segments are
removed from consideration. Input is accepted in multiple formats, including FASTA and BLAST report.
Output can be given in either an easily understandable description or one of several machine-readable
formats including FASTA and the General Feature Format (gff). These output formats allow our application
to integrate easily with other software. Thanks to full integration with existing tools including NCBI
BLAST, and the Generic Genome Browser (http://www.gmod.org/ggb/), the RepeatAssembler provides an
intuitive interface that allows the user to quickly identify biologically-relevant information.
RepeatAssembler consistently delivers repetitive sequences that agree with known full-length repeats
for several fungal genomes.
Return to Poster Abstract Index
Return to Top
|