AM1
Introduction to Evolutionary and Functional Genomic Analysis
Cristian Castillo-Davis, Post-doc, Harvard University
Genomes are being released at a dizzying pace and high-throughput methods
are creating mountains of comparative data. How can we integrate and use
these resources in a productive and meaningful way?
In this tutorial we will cover some of the basic concepts involved in
comparative genomic and functional genomic analysis and learn a few basic
techniques that should help any worker get started analyzing data from a
genomic perspective. Attention will be paid to different methods of
measuring protein evolution, building evolutionary trees, and determining
homology (orthology and paralogy). Integration of evolutionary and
genome-scale data with gene annotations, microarray data, and EST data in
a statistical framework will be a major focus. In particular we will learn
how to use the simple but versatile genomic tool GeneMerge
(http://www.oeb.harvard.edu/hartl/lab/publications/GeneMerge/index.html).
No programming knowledge and little biological background is assumed. The
amount of time spent on each topic will be determined by participant
interest.
Cristian Castillo-Davis received his Ph.D. in Organismic and Evolutionary
Biology from Harvard University and is currently a post-doctoral research
fellow in the Department of Statistics (also at Harvard) with Jun S. Liu.
His current research is aimed at uncovering the mechanics of
cis-regulatory sequence evolution and gene network evolution using both
laboratory and computational approaches.
Return to Tutorials
AM2
Tandem Mass Spectrometry in Proteomics
Ming Li Bing Ma, Assistant Professor and a Tier 2 Canada Research Chair
in Bioinformatics, Department of Computer Science, University of Western Ontario
The applications of mass spectrometry and tandem mass spectrometry in
proteomics, including protein identification, and protein quantitative analysis will be introduced.
The tutorial will start with an overview introduction to different
types of mass spectrometers and tandem mass spectrometers. Then different
computational methods for protein identification, including database search and
de novo sequencing will be discussed. The Isotope-Coded Affinity Tag (ICAT)
analysis for protein quantitation will be briefly introduced at the last.
Bin Ma is currently an Assistant Professor and a Tier 2 Canada Research Chair
in Bioinformatics in the Department of Computer Science at the University of Western Ontario.
He is also the CTO of Bioinformatics Solutions Inc. He received his Ph.D. degree in
Peking University in 1999, and was a recipient of Ontario Premier's Research Excellence award in
2003 for his research in bioinformatics.
Return to Tutorials
AM3
How to Use the Genome Browsers to Get the Most Out of Public Genomes
Daryl Thomas, Ph.D. candidate, University of California, Santa Cruz
The field of bioinformatics is playing an increasingly large role in the
study of fundamental biomedical problems due to the explosion of sequence,
structural, and functional information available to researchers. The
challenge facing biologists, especially in light of the vast amount of data
being produced by the Human Genome Project and other large scale efforts,
will be to analyze such information to reveal previously unknown
relationships with respect to gene and protein structure and function. The
primary aim of this session is to expose scientists with minimal background
in bioinformatics to the methods used for browsing and analyzing the vast
amount of publicly available data. The session will include case-driven
demonstrations of freely available online browsers and the development of
custom tracks to display in-house data in the same context. The practical
use of these resources will be emphasized through live demonstrations where
possible.
Mr. Thomas is focused on integrating human variation data with comparative
genomics in the UCSC Genome Bioinformatics group. He has developed the
human variation tracks for the UCSC browser, is working toward improved
multiple sequence alignment algorithms, and is developing the repository for
the ENCODE community. Related previous work includes detection and
genotyping of polymorphisms at Affymetrix and Perlegen, studies of neuronal
plasticity at UC Davis, and the cloning and characterization of several
transcription factors at HHMI. Mr. Thomas holds a B.S. in chemistry from
Carnegie Mellon University and a M.S. in computer science from the
University of California at Santa Cruz.
Return to Tutorials
AM4
Computational Genetics: Haplotype Inference and Applications in Human Disease Gene Mapping
Tianhua Niu, Assistant Professor of Medicine, Harvard Medical School, and Director of Bioinformatics at the Division of
Preventive Medicine, Dept. of Medicine, Brigham and Women's Hospital
With the advent of the international HapMap project, there has been a
surging interest in tackling statistical challenges in haplotype phasing
using genotype data of multiple linked single nucleotide polymorphisms. In
this tutorial, I'll cover the following topics:
(a) Haplotype inference. I'll illustrate the idea of using a
partition-ligation (PL) approach in halplotype inference using both
Bayesian and Expectation-Maximization frameworks. Simulated and real-world
data are used to compare the performances of various statistical haplotype
inferences algorithms. Moreover, I'll introduce a new concept—GenoSpectrum
(based on probabilistic genotype calls), a new genotype
clustering algorithm based on t-mixture model called GeneScore, as well as
a new haplotype phasing algorithm, GS-EM that can handle ambiguous
genotype data.
(b) Linkage Disequilibrium (LD) Analysis. I'll illustrate that the use of
permutation tests, likelihood ratio tests, logistic regression models, and
Bayesian statistical models in performing haplotype-based LD analyses,
with applications in preterm delivery, Alzheimer disease, and secondary
hyperparathyroidism.
I'll present my vision of the future directions of haplotype analyses in
the concluding remarks.
Dr. Tianhua Niu is an Assistant Professor of Medicine at the Harvard
Medical School, and Director of Bioinformatics at the Division of
Preventive Medicine, Dept. of Medicine, Brigham and Women's Hospital. Dr.
Niu received his doctoral degree in Biologic Sciences at Harvard
University, trained jointly in molecular genetics and genetic epidemiology
in 1998. Dr. Niu also holds an M.S. degree in Computer Science from
Northeastern University, and is a member of IEEE Computer Society. Dr. Niu
did his post-doctoral training is statistical genetics and bioinformatics,
and has authored/co-authored over fifty publications. Dr. Niu and his
colleagues developed novel statistical methodologies for genetic analyses,
including POLYMORPHISM, HAPLOTYPER, PLEM, GeneScore, GSEM, and several
visualization bioinformatics software packages such as SeqVIST.
Return to Tutorials
AM5
Introduction to Dynamic Programming and Its Applications to Bioinformatics
Robert Edgar, Visiting Scholar, UC Berkeley
Dynamic programming (DP) is a widely used technique in computational
biology and is an essential skill for anyone with an interest in algorithm development
or maintenance. DP is fundamental to most sequence comparison algorithms, including
BLAST, CLUSTALW, whole-genome aligners, hidden Markov models and many other applications.
This tutorial will introduce both the mathematical and practical aspects of DP with a
focus on developing a solid understanding of the basic concepts, giving students the
ability to understand advanced textbooks and the literature. Starting with the most
fundamental sequence comparison algorithm, computing the edit distance of a pair of
strings, the course will cover global and local alignment (Needleman-Wunsch and
Smith-Waterman algorithms), affine gap penalties, hidden Markov models (Viterbi algorithm),
profile-profile and multiple alignment, and whole-genome alignment. For each type of
algorithm, we will discuss examples of well-known bioinformatics programs, such as BLAST,
CLUSTALW, and HMMER, for which source code is available. Students are expected to have
a working knowledge of programming in the C language; the relevant biology will be briefly
covered as needed.
Robert C. (Bob) Edgar is a Visiting Scholar at UC Berkeley. Bob received a
Ph.D. in high-energy theoretical physics from University College London in 1982. He
subsequently founded and ran a software company that was sold to Intel Corp. in 1999.
In 2002 he retired from Intel and began work in computational biology in collaboration
with scientists at UC Berkeley and elsewhere. His work has primarily focused on sequence
alignment methods with a particular interest in hidden Markov models and multiple alignment,
leading to seven published papers over the past two years.
Return to Tutorials
PM6
Bioinformatics: The Machine Learning Approach
Pierre Baldi, Professor, University of California, Irvine
Machine learning approaches play a significant role in bioinformatics due to
the abundance of highly variable data and the lack of comprehensive theories. We will provide
a brief overview of machine learning approaches in bioinformatics including:
- The Bayesian statistical framework for modeling and induction as
the common foundation for all machine learning and data mining algorithms.
- Some of the main model classes, such as neural networks, hidden Markov
models, Bayesian networks and graphical models, stochastic context-free grammars.
- Examples of specific applications such as:
- -neural networks for the prediction of protein functional sites and secondary and tertiary structure;
- -hidden Markov models of biological sequences for data base searches, multiple alignments,
pattern discover, and gene finding.
Reference:
P. Baldi and S. Brunak, "Bioinformatics: the Machine Learning Approach,"
MIT Press, second edition 2001.
Pierre Baldi is a Professor in the School of Information and Computer Science and the Department
of Biological Chemistry and the Director of the Institute for Genomics and Bioinformatics at the
University of California, Irvine. Born and raised in Europe, he received his PhD from the California
Institute of Technology in 1986. From 1986 to 1988 he was a postdoctoral fellow at the University
of California, San Diego. From 1988 to 1995 he held faculty and member of the technical staff
positions at the California Institute of Technology and at the Jet Propulsion Laboratory. He was
CEO of a startup company from 1995 to 1999 and joined UCI in 1999. He is the recipient of a 1993
Lew Allen Award at JPL and a Laurel Wilkening Faculty Innovation Award at UCI. Dr. Baldi has written
over 100 research articles and four books:
- Modeling the Internet and the We--Probabilistic Methods and Algorithms, Wiley, (2003);
- DNA Microarrays and Gene Regulation--From Experiments to Data Analysis and Modeling, Cambridge University Press, (2002);
- The Shattered Self--The End of Evolution, MIT Press, (2001);
- Bioinformatics: the Machine Learning Approach, MIT Press, Second Edition (2001).
His research focuses in AI, machine learning, and bioinformatics.
Return to Tutorials
PM7
Using dChip for Microarray and SNP Chip Data Analysis<
Yu Guo, Ph.D. candidate, Harvard School of Public Health
DNA-Chip Analyzer (dChip) is a software package implementing model-based
expression analysis of oligonucleotide arrays (Li and Wong 2001a) and
several high-level analysis procedures. The model-based approach allows
probe-level analysis on multiple arrays. By pooling information across
multiple arrays, it is possible to assess standard errors for the
expression indexes. This approach also allows automatic probe selection
in the analysis stage to reduce errors due to cross-hybridizing probes
and image contamination. High-level analysis in dChip includes
comparative analysis and hierarchical clustering. Also see the
comparison with Affy MAS software.
Topics will include brief tutorials on the functions of dchip, and using
dchip for microarray and SNP analysis. A live demonstration can be
arranged.
http://biosun1.harvard.edu/complab/dchip/
Yu Guo is a 3rd year PhD student at Department of Biostatistics, Harvard
School of Public Health. Her general area of interest is microarray data
analysis under the guidance of Dr. Cheng Li, the author of dchip
software. Her work involves comparative genomic analysis of whole
genome, normalization issues in mRNA microarray analysis, and
correlation of microarray data with clinical variables.
Return to Tutorials
PM8
Discovering regulatory networks from gene expression and promoter sequence
Eran Segal, Ph.D. candidate, Stanford University
Genomic datasets, spanning many organisms and data types, are rapidly being
produced, creating new opportunities for understanding the molecular
mechanisms underlying human disease, and for studying complex biological
processes on a global scale. Transforming these immense amounts of data into
biological information is a challenging task. In this tutorial, I will
present a statistical modeling language, that addresses this challange. The
language is based on Bayesian networks, represents heterogeneous biological
entities, and models the mechanism by which they interact. I will also
present statistical learning approaches in order to learn the details of
these models (structure and parameters) automatically from raw genomic data.
The biological insights are then derived directly from the learned model.
In this tutorial, I will describe three applications of this framework to
the study of gene regulation:
- Understanding the process by which DNA patterns (motifs) in the control
regions of genes play a role in controlling their activity. Using only DNA
sequence and gene expression data as input, these models recovered many of
the known motifs in yeast and several known motif combinations in human.
- Finding regulatory modules and their actual regulator genes directly from
gene expression data. Some of the predictions from this analysis were tested
successfully in the wet-lab, suggesting regulatory roles for three
previously uncharacterized proteins.
- Combining gene expression profiles from several organisms for a more
robust prediction of gene function and regulatory pathways, and for studying
the degree to which regulatory relationships have been conserved across
evolution.
Mr. Segal works on computational biology, focusing on exploiting genomic
data for the study of real world biological problems. He also develops
visualization and browsing tools that are easily accessible to biologists,
including GeneXPress, a generic software environment for visualization and
statistical analysis of heterogeneous genomic data. Segal holds a B.Sc. in
Computer Science from Tel Aviv University, and is currently a Ph.D.
candidate at Stanford (Computer Science).
Return to Tutorials
PM9
Computational Methods in Phylogenetics
Tandy Warnow, Professor, University of Texas, Austin
The international systematic biology community is attempting to infer the "Tree of Life", an
evolutionary tree (or network) which will contain millions of leaves. A reasonably accurate estimation of this
history will require novel algorithms since current approaches for phylogenetic reconstruction (which
attempt to solve hard optimization problems) are not able to provide good analyses on datasets containing
thousands of sequences in reasonable time periods. This tutorial will address issues involved in developing
approaches which can enable highly accurate phylogenetic reconstructions. Specific topics that
will be addressed include:
- Stochastic models of evolution, issues with the models, and
statistical estimation under these models.
- The major optimization problems in phylogeny reconstruction - maximum likelihood and maximum parsimony.
- Evaluating reconstruction methods on real and on simulated data.
- New approaches for getting better solutions to hard optimization problems.
- Open problems.
Tandy Warnow is Professor of Computer Sciences at the University
of Texas at Austin, and Emeline Bigelow Conland Fellow at
the Radcliffe Institute for Advanced Studies.
Her research combines mathematics, computer science,
and statistics to develop improved models and algorithms
for reconstructing complex and large-scale evolutionary histories in both biology
and historical linguistics. She is on the board of directors
of the International Society for Computational Biology,
and previously was the Co-Director of the Center for Computational
Biology and Bioinformatics at the University of Texas at Austin.
Tandy received the National Science Foundation
Young Investigator Award in 1994, and the David and Lucile
Packard Foundation Award in Science and Engineering in 1996.
She is currently focusing her efforts
on the CIPRES Project (http://www.phylo.org, Cyber-Infrastructure
for Phylogenetic Research), which is an NSF-funded project to help build
a national computational infrastructure for large-scale phylogenetic reconstruction.
Return to Tutorials
PM10
From Sequence to Structure: Protein Structure Prediction
Juntao Guo, Research
Assistant Professor, Department of Biochemistry and Molecular
Biology, University of Georgia
Ying Xu
The knowledge of the detailed structure of a protein holds the
key to our understanding of the biological function of the protein. Yet in
the post-genomics era the gap between the number of solved protein
structures and that of known protein sequences continues to expand
rapidly, largely due to the long and expensive process always required to
experimentally determine structures. Computational prediction of
structures from amino acid sequences has been successful in providing
useful information for the biological research community and is playing a
key role in bridging the gap.
This tutorial will introduce the basic aspects of protein structures and
the techniques for protein structure prediction. There are three major
methods for protein structure prediction: Comparative modeling, fold
recognition, and ab initio prediction. We will introduce these methods
with an emphasis on threading technique. We will cover the key components
of threading: templates, energy functions, threading algorithms and
assessments.
Jun-tao Guo received his Ph.D. in Biochemistry and M.S. in Computer
Science from University of Kentucky. He spent two years as a postdoc in
the Protein Informatics Group of Oak Ridge National Laboratory (ORNL)
before moving to University of Georgia. He will become a research
assistant professor in the Department of Biochemistry and Molecular
Biology at the University of Georgia.
Return to Tutorials
Return to Top
|