CSB2004

Tutorial Abstracts

AM1

Introduction to Evolutionary and Functional Genomic Analysis
Cristian Castillo-Davis, Post-doc, Harvard University

Genomes are being released at a dizzying pace and high-throughput methods are creating mountains of comparative data. How can we integrate and use these resources in a productive and meaningful way?

In this tutorial we will cover some of the basic concepts involved in comparative genomic and functional genomic analysis and learn a few basic techniques that should help any worker get started analyzing data from a genomic perspective. Attention will be paid to different methods of measuring protein evolution, building evolutionary trees, and determining homology (orthology and paralogy). Integration of evolutionary and genome-scale data with gene annotations, microarray data, and EST data in a statistical framework will be a major focus. In particular we will learn how to use the simple but versatile genomic tool GeneMerge (http://www.oeb.harvard.edu/hartl/lab/publications/GeneMerge/index.html).

No programming knowledge and little biological background is assumed. The amount of time spent on each topic will be determined by participant interest.

Cristian Castillo-Davis received his Ph.D. in Organismic and Evolutionary Biology from Harvard University and is currently a post-doctoral research fellow in the Department of Statistics (also at Harvard) with Jun S. Liu. His current research is aimed at uncovering the mechanics of cis-regulatory sequence evolution and gene network evolution using both laboratory and computational approaches.

Return to Tutorials

AM2

Tandem Mass Spectrometry in Proteomics
Ming Li
Bing Ma, Assistant Professor and a Tier 2 Canada Research Chair in Bioinformatics, Department of Computer Science, University of Western Ontario

The applications of mass spectrometry and tandem mass spectrometry in proteomics, including protein identification, and protein quantitative analysis will be introduced.

The tutorial will start with an overview introduction to different types of mass spectrometers and tandem mass spectrometers. Then different computational methods for protein identification, including database search and de novo sequencing will be discussed. The Isotope-Coded Affinity Tag (ICAT) analysis for protein quantitation will be briefly introduced at the last.

Bin Ma is currently an Assistant Professor and a Tier 2 Canada Research Chair in Bioinformatics in the Department of Computer Science at the University of Western Ontario. He is also the CTO of Bioinformatics Solutions Inc. He received his Ph.D. degree in Peking University in 1999, and was a recipient of Ontario Premier's Research Excellence award in 2003 for his research in bioinformatics.

Return to Tutorials

AM3

How to Use the Genome Browsers to Get the Most Out of Public Genomes
Daryl Thomas, Ph.D. candidate, University of California, Santa Cruz

The field of bioinformatics is playing an increasingly large role in the study of fundamental biomedical problems due to the explosion of sequence, structural, and functional information available to researchers. The challenge facing biologists, especially in light of the vast amount of data being produced by the Human Genome Project and other large scale efforts, will be to analyze such information to reveal previously unknown relationships with respect to gene and protein structure and function. The primary aim of this session is to expose scientists with minimal background in bioinformatics to the methods used for browsing and analyzing the vast amount of publicly available data. The session will include case-driven demonstrations of freely available online browsers and the development of custom tracks to display in-house data in the same context. The practical use of these resources will be emphasized through live demonstrations where possible.

Mr. Thomas is focused on integrating human variation data with comparative genomics in the UCSC Genome Bioinformatics group. He has developed the human variation tracks for the UCSC browser, is working toward improved multiple sequence alignment algorithms, and is developing the repository for the ENCODE community. Related previous work includes detection and genotyping of polymorphisms at Affymetrix and Perlegen, studies of neuronal plasticity at UC Davis, and the cloning and characterization of several transcription factors at HHMI. Mr. Thomas holds a B.S. in chemistry from Carnegie Mellon University and a M.S. in computer science from the University of California at Santa Cruz.

Return to Tutorials

AM4

Computational Genetics: Haplotype Inference and Applications in Human Disease Gene Mapping
Tianhua Niu, Assistant Professor of Medicine, Harvard Medical School, and Director of Bioinformatics at the Division of Preventive Medicine, Dept. of Medicine, Brigham and Women's Hospital

With the advent of the international HapMap project, there has been a surging interest in tackling statistical challenges in haplotype phasing using genotype data of multiple linked single nucleotide polymorphisms. In this tutorial, I'll cover the following topics:

(a) Haplotype inference. I'll illustrate the idea of using a partition-ligation (PL) approach in halplotype inference using both Bayesian and Expectation-Maximization frameworks. Simulated and real-world data are used to compare the performances of various statistical haplotype inferences algorithms. Moreover, I'll introduce a new concept—GenoSpectrum (based on probabilistic genotype calls), a new genotype clustering algorithm based on t-mixture model called GeneScore, as well as a new haplotype phasing algorithm, GS-EM that can handle ambiguous genotype data.

(b) Linkage Disequilibrium (LD) Analysis. I'll illustrate that the use of permutation tests, likelihood ratio tests, logistic regression models, and Bayesian statistical models in performing haplotype-based LD analyses, with applications in preterm delivery, Alzheimer disease, and secondary hyperparathyroidism.

I'll present my vision of the future directions of haplotype analyses in the concluding remarks.

Dr. Tianhua Niu is an Assistant Professor of Medicine at the Harvard Medical School, and Director of Bioinformatics at the Division of Preventive Medicine, Dept. of Medicine, Brigham and Women's Hospital. Dr. Niu received his doctoral degree in Biologic Sciences at Harvard University, trained jointly in molecular genetics and genetic epidemiology in 1998. Dr. Niu also holds an M.S. degree in Computer Science from Northeastern University, and is a member of IEEE Computer Society. Dr. Niu did his post-doctoral training is statistical genetics and bioinformatics, and has authored/co-authored over fifty publications. Dr. Niu and his colleagues developed novel statistical methodologies for genetic analyses, including POLYMORPHISM, HAPLOTYPER, PLEM, GeneScore, GSEM, and several visualization bioinformatics software packages such as SeqVIST.

Return to Tutorials

AM5

Introduction to Dynamic Programming and Its Applications to Bioinformatics
Robert Edgar, Visiting Scholar, UC Berkeley

Dynamic programming (DP) is a widely used technique in computational biology and is an essential skill for anyone with an interest in algorithm development or maintenance. DP is fundamental to most sequence comparison algorithms, including BLAST, CLUSTALW, whole-genome aligners, hidden Markov models and many other applications. This tutorial will introduce both the mathematical and practical aspects of DP with a focus on developing a solid understanding of the basic concepts, giving students the ability to understand advanced textbooks and the literature. Starting with the most fundamental sequence comparison algorithm, computing the edit distance of a pair of strings, the course will cover global and local alignment (Needleman-Wunsch and Smith-Waterman algorithms), affine gap penalties, hidden Markov models (Viterbi algorithm), profile-profile and multiple alignment, and whole-genome alignment. For each type of algorithm, we will discuss examples of well-known bioinformatics programs, such as BLAST, CLUSTALW, and HMMER, for which source code is available. Students are expected to have a working knowledge of programming in the C language; the relevant biology will be briefly covered as needed.

Robert C. (Bob) Edgar is a Visiting Scholar at UC Berkeley. Bob received a Ph.D. in high-energy theoretical physics from University College London in 1982. He subsequently founded and ran a software company that was sold to Intel Corp. in 1999. In 2002 he retired from Intel and began work in computational biology in collaboration with scientists at UC Berkeley and elsewhere. His work has primarily focused on sequence alignment methods with a particular interest in hidden Markov models and multiple alignment, leading to seven published papers over the past two years.

Return to Tutorials

PM6

Bioinformatics: The Machine Learning Approach
Pierre Baldi, Professor, University of California, Irvine

Machine learning approaches play a significant role in bioinformatics due to the abundance of highly variable data and the lack of comprehensive theories. We will provide a brief overview of machine learning approaches in bioinformatics including:

The Bayesian statistical framework for modeling and induction as the common foundation for all machine learning and data mining algorithms.

Some of the main model classes, such as neural networks, hidden Markov models, Bayesian networks and graphical models, stochastic context-free grammars.

Examples of specific applications such as:

-neural networks for the prediction of protein functional sites and secondary and tertiary structure;

-hidden Markov models of biological sequences for data base searches, multiple alignments, pattern discover, and gene finding.

Reference:
P. Baldi and S. Brunak, "Bioinformatics: the Machine Learning Approach," MIT Press, second edition 2001.

Pierre Baldi is a Professor in the School of Information and Computer Science and the Department of Biological Chemistry and the Director of the Institute for Genomics and Bioinformatics at the University of California, Irvine. Born and raised in Europe, he received his PhD from the California Institute of Technology in 1986. From 1986 to 1988 he was a postdoctoral fellow at the University of California, San Diego. From 1988 to 1995 he held faculty and member of the technical staff positions at the California Institute of Technology and at the Jet Propulsion Laboratory. He was CEO of a startup company from 1995 to 1999 and joined UCI in 1999. He is the recipient of a 1993 Lew Allen Award at JPL and a Laurel Wilkening Faculty Innovation Award at UCI. Dr. Baldi has written over 100 research articles and four books:

Modeling the Internet and the We--Probabilistic Methods and Algorithms, Wiley, (2003);
DNA Microarrays and Gene Regulation--From Experiments to Data Analysis and Modeling, Cambridge University Press, (2002);
The Shattered Self--The End of Evolution, MIT Press, (2001);

Bioinformatics: the Machine Learning Approach, MIT Press, Second Edition (2001).

His research focuses in AI, machine learning, and bioinformatics.

Return to Tutorials

PM7

Using dChip for Microarray and SNP Chip Data Analysis<
Yu Guo, Ph.D. candidate, Harvard School of Public Health

DNA-Chip Analyzer (dChip) is a software package implementing model-based expression analysis of oligonucleotide arrays (Li and Wong 2001a) and several high-level analysis procedures. The model-based approach allows probe-level analysis on multiple arrays. By pooling information across multiple arrays, it is possible to assess standard errors for the expression indexes. This approach also allows automatic probe selection in the analysis stage to reduce errors due to cross-hybridizing probes and image contamination. High-level analysis in dChip includes comparative analysis and hierarchical clustering. Also see the comparison with Affy MAS software.

Topics will include brief tutorials on the functions of dchip, and using dchip for microarray and SNP analysis. A live demonstration can be arranged.

http://biosun1.harvard.edu/complab/dchip/

Yu Guo is a 3rd year PhD student at Department of Biostatistics, Harvard School of Public Health. Her general area of interest is microarray data analysis under the guidance of Dr. Cheng Li, the author of dchip software. Her work involves comparative genomic analysis of whole genome, normalization issues in mRNA microarray analysis, and correlation of microarray data with clinical variables.

Return to Tutorials

PM8

Discovering regulatory networks from gene expression and promoter sequence
Eran Segal, Ph.D. candidate, Stanford University

Genomic datasets, spanning many organisms and data types, are rapidly being produced, creating new opportunities for understanding the molecular mechanisms underlying human disease, and for studying complex biological processes on a global scale. Transforming these immense amounts of data into biological information is a challenging task. In this tutorial, I will present a statistical modeling language, that addresses this challange. The language is based on Bayesian networks, represents heterogeneous biological entities, and models the mechanism by which they interact. I will also present statistical learning approaches in order to learn the details of these models (structure and parameters) automatically from raw genomic data. The biological insights are then derived directly from the learned model.

In this tutorial, I will describe three applications of this framework to the study of gene regulation:

Understanding the process by which DNA patterns (motifs) in the control regions of genes play a role in controlling their activity. Using only DNA sequence and gene expression data as input, these models recovered many of the known motifs in yeast and several known motif combinations in human.
Finding regulatory modules and their actual regulator genes directly from gene expression data. Some of the predictions from this analysis were tested successfully in the wet-lab, suggesting regulatory roles for three previously uncharacterized proteins.
Combining gene expression profiles from several organisms for a more robust prediction of gene function and regulatory pathways, and for studying the degree to which regulatory relationships have been conserved across evolution.

Mr. Segal works on computational biology, focusing on exploiting genomic data for the study of real world biological problems. He also develops visualization and browsing tools that are easily accessible to biologists, including GeneXPress, a generic software environment for visualization and statistical analysis of heterogeneous genomic data. Segal holds a B.Sc. in Computer Science from Tel Aviv University, and is currently a Ph.D. candidate at Stanford (Computer Science).

Return to Tutorials

PM9

Computational Methods in Phylogenetics
Tandy Warnow, Professor, University of Texas, Austin

The international systematic biology community is attempting to infer the "Tree of Life", an evolutionary tree (or network) which will contain millions of leaves. A reasonably accurate estimation of this history will require novel algorithms since current approaches for phylogenetic reconstruction (which attempt to solve hard optimization problems) are not able to provide good analyses on datasets containing thousands of sequences in reasonable time periods. This tutorial will address issues involved in developing approaches which can enable highly accurate phylogenetic reconstructions. Specific topics that will be addressed include:

Stochastic models of evolution, issues with the models, and statistical estimation under these models.

The major optimization problems in phylogeny reconstruction - maximum likelihood and maximum parsimony.

Evaluating reconstruction methods on real and on simulated data.

New approaches for getting better solutions to hard optimization problems.

Open problems.

Tandy Warnow is Professor of Computer Sciences at the University of Texas at Austin, and Emeline Bigelow Conland Fellow at the Radcliffe Institute for Advanced Studies. Her research combines mathematics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in both biology and historical linguistics. She is on the board of directors of the International Society for Computational Biology, and previously was the Co-Director of the Center for Computational Biology and Bioinformatics at the University of Texas at Austin. Tandy received the National Science Foundation Young Investigator Award in 1994, and the David and Lucile Packard Foundation Award in Science and Engineering in 1996. She is currently focusing her efforts on the CIPRES Project (http://www.phylo.org, Cyber-Infrastructure for Phylogenetic Research), which is an NSF-funded project to help build a national computational infrastructure for large-scale phylogenetic reconstruction.

Return to Tutorials

PM10

From Sequence to Structure: Protein Structure Prediction
Juntao Guo, Research Assistant Professor, Department of Biochemistry and Molecular Biology, University of Georgia
Ying Xu

The knowledge of the detailed structure of a protein holds the key to our understanding of the biological function of the protein. Yet in the post-genomics era the gap between the number of solved protein structures and that of known protein sequences continues to expand rapidly, largely due to the long and expensive process always required to experimentally determine structures. Computational prediction of structures from amino acid sequences has been successful in providing useful information for the biological research community and is playing a key role in bridging the gap.

This tutorial will introduce the basic aspects of protein structures and the techniques for protein structure prediction. There are three major methods for protein structure prediction: Comparative modeling, fold recognition, and ab initio prediction. We will introduce these methods with an emphasis on threading technique. We will cover the key components of threading: templates, energy functions, threading algorithms and assessments.

Jun-tao Guo received his Ph.D. in Biochemistry and M.S. in Computer Science from University of Kentucky. He spent two years as a postdoc in the Protein Informatics Group of Oak Ridge National Laboratory (ORNL) before moving to University of Georgia. He will become a research assistant professor in the Department of Biochemistry and Molecular Biology at the University of Georgia.

Return to Tutorials

Return to Top

Return to
Tutorials

HOME • REGISTRATION • PAPERS • POSTERS • TUTORIALS • PROGRAM • KEYNOTE SPEAKERS • INVITED SPEAKERS
SPECIAL EVENTS • COMMITTEES • SPONSORS • NEWS ROOM • CONTACT US • PREVIOUS CONFERENCES