CSB2004

POSTER ABSTRACTS:
PATHWAYS NETWORKS SYSTEMS BIOLOGY

A New Hierarchical Method for Identification of Dynamic Regulatory Pathways from Time-Series DNA Microarray Data
Alireza Darvish, Eunsang Bak, Krishna Gopalakrishnan, and Kayvan Najarian
University of North Carolina

A new hierarchical method is proposed to analyze time-series DNA microarray data to identify the dynamic genetic pathways. The hierarchical technique first applies a specialized clustering technique to incorporate the available heuristic information about the biological system. Then, the prototypes of the resulting clusters are used as time-variables to develop an Auto Regressive (AR) model. Relating the expression of the prototypes to each other, as opposed to using the expression of every individual gene as a variable in the AR model, avoids having to estimate a large number of AR parameters from rather short time-series. The resulting model also allows the prediction of the gene expressions for the next time steps. The developed AR model can then be used to relate the expression value of each single gene to the genes of other clusters (through their prototype). The method also estimates the dynamic pathway among the prototypes and quantitatively expresses the dynamic interactions among the clusters time. The proposed method was applied to the cell-cycle dataset containing the DNA microarray time-series of a large number of genes involved in the eukaryotic cell-cycle. The technique resulted to a network of interactions among five clusters of genes in which the genes of each cluster have a biologically-meaningful trend in time. The predicted number the clusters (i.e. five), the function of each cluster (e.g. G1 development stage), and the interactions among genes agree with reported biological studies. This accordance indicates the success of the proposed method in discovering the dynamic genetic pathways.

Boosted PRIM with Application to Searching for Oncogenic Pathways of Cancer
Pei Wang (1) and Robert Tibshirani (2)
(1) Department of Statistics, and (2) Department of HRP & Statistics, Stanford University

Boosted Patient Rule Induction Method (PRIM) is a new algorithm developed for two-class classification problems. PRIM is a variation of those Tree-Based methods, seeking box-shaped regions in the feature space to separate different classes. Boosted PRIM is to implement PRIM-styled weak learners in Adaboost, one of the most popular boosting algorithms. In addition, we improve the performance of the algorithm by introducing some regularization to the boosting process, which supports the perspective of viewing boosting as a steepest-descent numerical optimization by Jerry Friedman (2001). The motivation for Boosted PRIM is to solve the problem of "searching for oncogenic pathways" based on array-CGH (Comparative Genomic Hybridization) data sets. The idea is to consider the pathway problem from a classification point of view, and infer the pathway structures through the well-selected classification rules. We illustrate the performance of the method through some simulation study as well as an application on an ovary cancer array-CGH data set (110 arrays in total). The simulation study shows that for data sets with reasonable sample sizes (~100), Boosted PRIM can accurately recover the pathway components. Moreover, on varying simulation data sets, Boosted PRIM outperformed those simple approaches, e.g. correlation matrixes or hierarchical clusters which are often used to study gene associations. On the real data set, Boosted PRIM recovers some gene associations confirmed in previous studies. It also suggests some new gene-gene interactions which casts light to future oncogenic study of ovary cancer.

Return to Poster Abstract Index
Return to Top

Genetic Algorithm for Inferring Time Delays in Gene Regulatory Networks
Fang-Xiang Wu
University of Saskatchewan

The real gene expression data example reveals a considerable number of time-delayed interactions suggesting that time delay is ubiquitous in gene regulation. Although many models have been proposed for gene regulatory networks, most of them are underestimated even if time delays are not accounted for. Recently we proposed the state-space model with time delays for gene regulatory networks. In order to uniquely determine all state transition matrices, p2(tmax+1) equations are needed, where p is the number of internal variables while tmax is the maximum number of the discrete time delays accounted for. However, only m*p equations are available implying that the system is underestimated while m < p(tmax+1), where m is the number of time points in gene expression dataset. Although the system can be uniquely determined under the assumption that a single time delay for every regulatory interaction, the solution space searching for the optimal one is still too large to use an exhaustive search method. This paper employs Boolean variables to capture the existence of the discrete time delays of the regulatory relationships among the internal variables, and proposes a genetic algorithm (GA) to determine the optimal Boolean variables (the optimal solution) and to further infer gene regulatory networks with time delays. Computational experiments have been performed on a real gene expression dataset. The results show that GA is effective to find the optimal solution, and that not only does the regulatory network with time delay obtained from the dataset possess the expected properties of a real gene regulatory network, but also improves the prediction accuracy by 72%, compared to gene regulatory networks without time delays.

Return to Poster Abstract Index
Return to Top

A Knowledge Base for Computational Pathway Reconstruction
Peng Wang, Zhengchang Su, Phuongan Dam, and Ying Xu
Computational System Biology Lab, UGA

The Synechococcus WH8102 knowledge base (http://www.csbl.bmb.uga.edu/WH8102) is a web based relational database developed to facilitate computational effort to reconstruct regulatory pathways and serve as a gateway for biologist to access the data. It is the repertoire that integrates a variety of knowledge derived both from literature and computational prediction. Those data are organized in hierarchical fashion. The basic building blocks are functional annotation and structure prediction of individual molecule. Those data are then organized into clusters based on computationally predicted operon, regulon and molecular complexes. Finally all data are complied into pathways derived from combined efforts of literature mining and computational prediction. A number of tools have been developed to facilitate the data retrieval including a SQL query engineer and several viewers to browse genome, molecular complexes and pathways.

Return to Poster Abstract Index
Return to Top

In silico Construction of the Carbon Fixation Pathway in Synechococcus sp. WH8102
Phuongan Dam, Zhengchang Su, Victor Olman and Ying Xu
Oak Ridge National Laboratories/ University of Georgia, Athens

Because the carbon fixation pathway plays an essential role in the primary production and natural carbon recycling process, and because the genome of the marine cyanobacterial Synechococcus sp. WH8102 (SYNWH8102) was recently sequenced, SYNWH8102 was chosen to further our understanding of the interaction and regulation of the carbon fixation pathway at the molecular level. In this poster, we present the carbon fixation pathway in SYNWH8102 as a result of our recently developed computational protocol for inference of regulatory and signaling pathways. Key components of the protocol are: (a) construction of template pathways from related microbial organisms, (b) mapping initial templates to the target genome, (c) refinement and expansion of the mapped pathway and (d) validation and refinement of the pathway models using experimental data or other information. During the refinement and expansion step, data mining tools including operon prediction, phylogenetic profile analysis, inference of protein-protein interactions, and binding site prediction program were used. The results of our pathway prediction include: (a) Major components of the carbon fixation pathway reported in the literature are present in SYNWH8102. (b) Approximately, 30 new candidates are added into the network from the results of the pathway expansion step. (c) Additionally, our in-house motif finding program, CUBIC, found 2 motifs that are present in the promoter regions of multiple genes involved in this pathway, suggesting that these genes are transcriptionally co-regulated. After experimental validation, novel candidates in the pathway will add new biological insights into the regulation and interaction between components of the carbon fixation pathway.

Return to Poster Abstract Index
Return to Top

A Finite Model Theory for Biological Hypotheses
Stephen Racunas, Christopher Griffin, and Nigam Shah
Penn State University

We have designed and implemented a set of software tools for the composition and evaluation of hypotheses about gene regulation in biological systems. Our software uses a unified formal grammar for the representation of both diagram-based and text-based hypotheses. The objective of this paper is to show how we can use this grammar as a basis for developing an effective logic for specifying hypotheses about biological systems in precise model-theoretic terms. To accomplish this, we take inspiration from inflationary extensions to fixed point logics and define a new type of logic: a deflationary logic for describing the effects of experiments upon models of biological systems. We present results that characterize the decidability, satisfiability, and inflationary/deflationary properties of this logic. We formally define what it means for a set of assertions to be discoverable under this new logic, and show that our software generates discoverable queries. Thus, we lay the groundwork for a formal treatment of machine-aided experimental design under the conceptual framework we have developed for our hypothesis evaluation software.

Return to Poster Abstract Index
Return to Top

Functional Modules from Protein Networks of Kinome and Cell Cycle in Saccharomyces cerevisiae
Hee Young Kang and Hak Yong Kim
Division of Life Sciences, College of Natural Sciences, Chungbuk National University, 361-763 Republic of Korea

We have constructed protein networks of kinome (proteins 2,130, interactions: 3,392, and clustering coefficient: 0.05362) and cell cycle (proteins: 2,099, interactions: 4,923, and clustering coefficient: 0.11668) from Saccharomyces cerevisiae genome [CYGD (Comprehensive Yeast Genome Database); http://mips.gsf.de/genre/proj/yeast/index.jps]. The networks show character of scale free network, in which a few highly connected proteins, called hub proteins, play a central role in most living organisms. Interpreting data from protein-protein interaction network obtained from experiments has been a challenging work because of the widespread presence of random false positives. To overcome these difficulties, we employ two considerations, cellular localization and DNA expression profile and then reconstructed purified kinome network (proteins: 1,161, interactions: 1,698, and clustering coefficient: 0.12204) and cell cycle network (proteins: 1,629, interactions: 4,155, and clustering coefficient: 0.14575). The purified kinome and cell cycle networks show characters of scale free network and hierarchical network, in which preserves its scale-free organization and displays inherent modularity of protein clusters. Integrating the information from the network may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique that is based on multi-body correlations in kinome or cell cycle network. From the derived modules, we predicted and estimated tentative functions for unannotated proteins with high certainty. Useful biological prediction regarding the lethality of the null mutants lacking hub proteins and/or major links could be made from both protein networks.

Return to Poster Abstract Index
Return to Top

Bacillus subtilis Protein Interaction Network Analysis
Olusola C. Idowu, Steven J. Lynden, Malcolm P. Young and Peter Andras
University of Newcastle upon Tyne, United Kingdom

Identifying functionally important proteins that are essential to the survival of a bacterial cell is of considerable interest in the development of new antimicrobial agents. In this study, we analyse the protein interaction network of the gram-positive bacteria Bacillus subtilis using data from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. Protein interaction networks were represented as graphs of interacting nodes and edges in which the nodes represent proteins and the edges represent interactions between proteins. We used novel computational techniques (bottlenecks, hubs, and second-order hubs) to analyse the topological structures of the network and to derive a list of the structurally important proteins within the network based on the relative damaging potential of each individual protein. We then ranked proteins in the order of structural importance within the network. Among 801 proteins involving 5017 interactions, 54% of proteins in the top 1% ranking selected by our method were encoded by functionally essential genes of Bacillus subtilis. Furthermore, a total of 38 structurally essential proteins were found to be encoded by functionally essential genes among the top 10% ranking, which account for 50% of the total number of essential proteins present in the protein interaction data analysed. These results show that the structural arrangement of proteins may provide vital clues about the functionality of proteins. The methods developed in this study could be used to rank proteins to determine their potential significance for drug target experiments.

Return to Poster Abstract Index
Return to Top

Dynamical Robustness in Gene Regulatory Networks
Jean-Loup Faulon and Shawn Martin
Sandia National Laboratories, Computational Biology Dept.

We investigate the robustness of biological networks, emphasizing gene regulatory networks. We define the robustness of a dynamical network as the magnitude of perturbation in terms of rates and concentrations that will not change the steady state dynamics of the network. If the network is limited to the Boolean activation-inhibition model, robustness becomes the number of different networks having the same set of attractors. We have systematically enumerated all activation-inhibition networks up to 6 nodes, and have sampled networks at random up to 20 nodes. These calculations show that most networks have unique steady state dynamics, while a few networks are highly robust. Precisely, if we plot the number of different steady state dynamics versus the dynamic robustness, we observe a power law. This result is unexpected since network characteristics for random graphs are known to follow Poisson distributions. We hypothesize that the power law behavior observed in biological networks is a consequence of dynamic robustness. To test this hypothesis we analyze the dynamics of three activation-inhibition networks: an E-coli transcriptional regulation network (Shen-Orr et al., 2002), a Yeast gene regulatory network (Guelzim et al., 2002), and a Yeast network inferred using microarray data from (Spellman et al., 1998). In these networks, we find that robust subgraphs occur more often than non-robust subgraphs, and that dynamical robustness increases with node degree. Finally, we propose a network growth model where nodes are added according to dynamic robustness and compare results with preferential attachment and gene duplication models.

Return to Poster Abstract Index
Return to Top

An Algorithm for Reconstruction of Markov Blankets in Bayesian Networks of Gene Expression Datasets
Catalin Barbacioru, Daniel J. Cowden, and Joel Saltz
Ohio State University

Bayesian networks are graph-based models of joint multivariate probability distributions that capture properties of conditional independence between variables. They have proved to be a useful and important tool in medicine for building decision support systems or in bioinformatics for discovering gene expression networks. We present a new local algorithm of polynomial complexity for learning Bayesian belief networks over a set of discrete variables. This algorithm generates a belief network close to the underlying model by recovering the Markov blanket of every node. The required sample size is dependent on the connectivity of the generating graph and not on the size of it. Therefore it yields to exponential savings in samples relative to previously known algorithms. We make use of bootstrap method in order to estimate confidence in the features of the learned networks, where confidence represents the likelihood that a given feature is actually true. At the current stage, we choose to focus on qualitative aspects of the data. We present an application of this algorithm on gene expression data in which we discretize these values into three categories: -1, 0, 1, depending whether the expression value is significantly lower, similar to or greater then the average level of the gene across all experiments. We applied our approach to cell cycle expression data of Spellman et al. [1], containing 76 gene arrays of 6177 S. cerevisiae ORFs. The analysis shows that we can recover intricate structures even from small data sets. Inspection of the top Markov relations between genes reveals that most of them are functionally related. Among these are the previously known relations between genes coding histones HTB1 and HHT1 (confidence 0.99), genes involved in mitosis, CLB1 and CLB2 (confidence 0.97), both known to be regulated by a combination of two transcription factors Mcm1p and SFF [2,3] or genes involved in DNA replication, CLN2 and CSI2 (confidence 0.96). The algorithm presented in this poster, successfully scale up BN-based causal discovery by adopting a local approach, and by reducing the total computational time to O(n2). We describe how to apply these techniques to gene expression data [1], and uncover many already validated and also not yet validated biological discoveries.

[1] Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol Biol Cell. 9(12):3273-97, 1998.
[2] Althoefer H, Schleiffer A, Wassmann K, Nordheim A, Ammerer G., Mcm1 is required to coordinate G2-specific transcription in Saccharomyces cerevisiae., Mol Cell Biol. 15(11):5917-28, 1995
[3] Sanders SL, Herskowitz I., The BUD4 protein of yeast, required for axial budding, is localized to the mother/BUD neck in a cell cycle-dependent manner, J Cell Biol. 134(2):413-27, 1996

Return to Poster Abstract Index
Return to Top

Synchronized Oscillations and Chaos in Coupled Genetic Repressilators
Jianbo Gao, Jesse Bridgewater, and Vwani P. Roychowdhury
Dept. of Electrical and Computer Engineering, Univ. of Florida

Living organisms are complex, and are typically composed of many interacting subsystems. In order to understand the complex genetic networks present in the whole cell, it is of crucial importance to first understand the dynamic behavior of modular genetic circuits. Recently a few subsystems on the genetic level, namely, genetic repressilators or oscillators, and bi-stable gene circuits, have been constructed and manipulated. In the former case, while mathematical model predicts simple oscillations, it has been observed that period varies from one oscillation to another considerably. Realizing that laboratory biochemical experiments take place in space in a distributed way, we study coupled repressilators. Several different types of coupling are considered. We find that synchronized oscillations may occur between nearly matched oscillators. It is also found that chaos can occur via period doubling route. This work thus naturally explains why the periods of the repressilators observed in the experiments vary so considerably. It is further disucssed that the feature of period doubling might have occurred experimentally.

Return to Poster Abstract Index
Return to Top

Inference of Gene Regulatory Network Based on Module Network Model with Gene Functional Classifications
Kohei Taki, Reiji Teramoto, Yoichi Takenaka, and Hideo Matsuda
Dept. of Bioinformatics Engineering, Graduate School of Information Science and Technology, Osaka University

Exhaustive inference of gene regulatory networks from genome-wide expression profile is an effective approach for understanding the complex biological processes. The inference among numerous genes potentially involves the problematic explosion of gene combinations. Module network model alleviates such explosion through the notion of module. In the notion a module is defined as a set of similar genes. However the conventional inferences of modules were performed only based on the similarity of expression patterns. Therefore the module inferences were significantly affected with the measurement errors of expression profiles. The resulting modules might include artificial interactions. In this study, we propose a novel method for the module inference based on gene functional classification. In our method, a module is built with not only genes with similar expression patterns but also genes classified into similar functional categories. The similarity between functional categories is estimated with semantic similarity. The criterion gives statistical estimations based on the notion of information contents for functional categories in the functional classification. Because genes having similar functions show similar expression patterns, regarding the similarity in gene function prevents the measurement errors from confusing the module inference. Therefore our module inference is robust against the errors. By applying our method we performed inference of the gene regulatory networks of yeast cell-cycle. The modules inferred by our method show consistency with experimentally-determined results on yeast cell-cycle, especially on G1 phase. We infer informative regulatory relationships based on robust modules built by the proposed method affected with less measurement errors.

Return to Poster Abstract Index
Return to Top

A Graph Analysis Method to Detect Metabolic Sub-networks Based on Phylogenetic Profile
Shoko Miyake, Yoichi Takenaka, and Hideo Matsuda
Genome Information Engineering, Dept. of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University

To elucidate fundamental constituting principle of functional modules or building blocks of metabolic networks, computational methods to analyze the network structure of metabolism are getting much attention. We propose a graph search method to extract highly conserved sub-networks of metabolic networks based on phylogenetic profile. Our method consists of three steps: (1) represent a metabolic network as a bipartite graph, which contains two types of nodes representing compounds and enzymatic reactions, respectively, (2) assign a reaction-conservation score to each enzymatic reaction node and a compound-conservation score to each compound node according to the phylogenetic profile of the enzymatic reactions, and (3) explore sub-networks which are globally conserved among various organisms based on the basic depth-first graph search algorithm with pre-defined thresholds for the conservation scores. We formulated reaction-conservation score for the measure of the phylogenetic conservation of reactions. We also formulated compound-conservation score to eliminate biologically-meaningless compounds and reduce the size of the networks. By applying our approach to the metabolic networks of 19 representative organisms selected from bacteria, archaea, and eukaryotes in the KEGG database, we detected some highly conserved sub-networks among the organisms. Comparing them to the metabolic maps in KEGG, we found they were mainly included in energy metabolism, sugar metabolism, and glutamate metabolism.

Return to Poster Abstract Index
Return to Top

Parallel Extreme Pathway Computation for Metabolic Networks
Lie-Quan Lee, Jeff Varner, and Kwok Ko
Stanford Linear Accelerator Center

Computation of the complete set of Extreme Pathways for genome-scale metabolic networks requires parallel processing because of exponential complexity. We have parallelized the extreme pathway algorithm presented by Schilling, et al in J. Theor. Biol. 203 (2000) using the Message Passing Interface (MPI). In the parallel implementation, the tableau consisting of the transpose of the stoichiometric matrix and the pathway matrix is partitioned row-wise. The algorithm decides the pivoting column in each iteration by selecting the column that produces the least number of new rows. A set of new rows is formed from the local tableau and the tableau in a remote MPI node. The local and remote rows are then checked, in a distributed manner, for systemic independence. The remote MPI node involved in forming new rows is then rotated to form all possible combinations. Communication among MPI nodes is drastically reduced by using bit representation for the pathway matrix and by exchanging the systemically independent rows instead of the newly-formed rows in checking for systemic independence. The parallel algorithm exhibits superlinear scalability because the number of independence tests performed decreases as the number of MPI nodes increases. A subsystem of the metabolic network of Escherichia coli with 140 reactions and 96 metabolites (without preprocessing) is used as a benchmark. The extreme pathways of this system are computed in under 280 seconds using 70 2.4GHz Intel Pentium IV CPUs with Myrinet interconnection among the dual-CPU nodes of the Linux cluster.

Return to Poster Abstract Index
Return to Top

Computational Construction of Nitrogen Assimilation Pathway in Cyanobacteria Synechococcus sp. WH8102
Z. Su, P. Dam, X. Chen, V. Olman, T. Jiang and Y. Xu
Dept. of Biochemistry and Molecular Biology, Univ. of Georgia, and Institute of Computational Biology, Oak Ridge National Laboratory; Dept. of Computer Science and Engineering, Univ. of California at Riverside

One of the challenges in post-genomic era research is to infer the pathways of a less-studied organism by solely comparing its genomic sequence to those of other well-studied ones. We have previously developed a computational protocol for inference of regulatory and signaling pathways in less-studied microbes, through mining high-throughput biological data of various types. Using a similar protocol we have constructed the nitrogen assimilation pathway in cyanobacteria Synechoccocus sp. WH8102. A considerable amount of high confident knowledge of this pathway was derived through mining various sources of high throughput data. 1. Synechococcus sp. WH8102 could utilize ammonium, nitrate/nitrite, cyanate, and urea as nitrogen source, but it lacks nitrogen fixation machinery. 2. A candidate set of proteins that might be involved in the coupling between CO2 fixation and nitrogen assimilation regulation was suggested. These proteins are sought by biologists for decades, narrowing down the candidates to an amenable set might provide a great chance for experimentalist to solve this mystery quickly. 3. The cis-regulatory elements for nitrogen global regulator ntcA and its associated reglons in Synechococcus sp. WH8102 were predicted, and compared to those known in other cyanobacteria. New members of the nctA regulon were predicted for experimental validation. 4. A few of regulatory proteins are predicted to be involved in the nitrogen assimilation pathway. They are promise targets for experimentalists to further elucidate this important pathway.

Acknowledgement: DOE office of Biological/Environmental Research, Genome to Life Project, "Carbon Sequestration in Synechococcus sp.: From Molecular Machines to Hierarchical Modeling."

Return to Poster Abstract Index
Return to Top

Pathway Mapping With Operon Information, An Integer-Programming Method
Fenglou Mao, Victor Olman, Zhengchang Su, Ying Xu
University of Georgia

Biological pathway mapping is an important problem in the post-genomic era. We now present a new algorithm for pathway mapping in microbes. The algorithm considers not only sequence similarity among the template and target genes, but also the operon structures in the target genome. We formulated the mapping problem as a graph finding problem, and solved it by an Integer-Programming (IP) method. The goal is to maximize a linear object function subject to six constraints, such that maximal sequence similarity among the template and target genes are achieved, and at the same time, a minimal number of operons are covered in the target genome. Compared to our previous Minimal Spanning Tree (MST) algorithm, the IP method has the following advantages: i) It is much faster and thus can map larger pathway involving a much large set of genes. ii) The IP method looks into the details of genes in the operons, and consequently avoids the many-to-one mapping mistakes that sometimes occur in the MST algorithm. We have compiled a large pathway training set to optimize the parameters of the program, and tested it by mapping 28 complex pathways from BioCyc onto E.coli K12 genome and the results are very promising.

Return to Poster Abstract Index
Return to Top

Analysis of Signaling Pathways in Human T-Cells Using Bayesian Network Modeling of Single Cell Data
Karen Sachs, Omar D. Perez, Dana Peer, Garry P. Nolan, David K. Gifford, Tommi S. Jaakkola and Douglas A. Lauffenburger
MIT

We applied Bayesian networks, a probabilistic modeling tool, to flow cytometry measurements of signaling molecules in human T-cells under various stimulatory and inhibitory conditions. Recent advances in multi dimensional flow cytometry enable the measurement of multiple (~12) intracellular events at the molecular level, in single-cell resolution (1). We use Bayesian networks, a class of graphical models, to address questions of dependence and connectivity among biomolecules of interest in the flow cytometry dataset, consisting of MAPK and Pi3K pathway components. These models extract probabilistic dependencies from data (Fig.1). Single-cell data of simultaneously observed biomolecules provides probabilistic power for modeling of molecular dependencies. In particular, the acyclic influence diagram among measured variables in a dataset can be extracted. Our resulting model contains expected connections known from the literature, apparently incorrect connections that, upon closer examination, reveal real biology, and as yet unconfirmed, putative novel connections that pose testable biological hypotheses. Our work demonstrates that Bayesian network modeling of multi dimensional flow cytometry data can help elucidate underlying signaling pathways and provide insight into the influence structure of molecules of interest. We are now in the exciting stage of pursuing hypotheses formulated from our model. Figure 1. A small Bayesian network showing some dependencies extracted from flow cytometry measurements of representatives from the TNF and EGF signaling pathways. In the Bayesian network, the nodes (boxes and ovals) represent biomolecules while the edges represent dependencies.

References
1. L. Herzenberg, D. Parks, B. Sahaf, O. Perez, M. Roederer and L. Herzenberg, The History and Future of the Fluorescence Activated Cell Sorter and Flow Cytometry: A View from Stanford. Clinical Chemistry, (2002), 48:10, 1819:1827.

Return to Poster Abstract Index
Return to Top

An Efficient Bayesian Method for Biological Pathway Discovery from High-throughput Experimental Data
Wei Wang and Gregory F. Cooper
Center for Biomedical Informatics, University of Pittsburgh

INTRODUCTION This poster describes a novel Bayesian method for discovering intra-cellular pathways from high throughput data.
METHODS The approach builds on previous work by Wagner (2001), who described a deterministic pathway discovery algorithm. We generalize that approach to create a Bayesian algorithm that combines experimental data (e.g., DNA microarray measurements) with prior belief (e.g., known gene-regulation pathways) to produce as output a probability distribution over the possible causal relationships between each pair of variables.
RESULTS We applied the Bayesian pathway discovery algorithm to gene expression data from a study published by Ideker et al. (2001) on galactose metabolism in yeast. The original Wagner algorithm is used as a point of comparison. The Area Under the ROC curve (AUROC) for the Wagner algorithm is 0.71. For the Bayesian algorithm the AUROC is 0.87, with a 95% confidence interval of (0.77, 0.94). Thus, the Bayesian method performed statistically significantly better than Wagner algorithm. The Bayesian algorithm has a time complexity that is quadratic in the number of variables and it required only 0.09 seconds to analyze the Ideker dataset.

REFERENCES
Wagner, A. (2001) How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps. Bioinformatics 17, 1183-1197.
Ideker, T., et al. (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929-934.

Return to Poster Abstract Index
Return to Top

Applying Two-level Simulated Annealing on Bayesian Structure Learning to Infer Genetic Networks
Tie Wang (1), Jeff Touchman (2), and Guoliang Xue (1)
(1)Computer Science and Engineering Department, ASU; (2) School of Life Science, ASU

Recent high-throughput molecular biology has motivated the development of algorithms and tools for analyzing gene expression data. Bayesian network is a common approach to study gene regulatory networks. Here, we explore the problem of inferring Bayesian structure from data that can be viewed as a search problem. The goal is to find a global optimized probability network model given the data. Currently used methods need to order nodes in advance, which ignores marginal independency. In this work, we propose a new search algorithm: Two-level Simulated Annealing (TLSA). This algorithm does not need to order the nodes, and marginal independency will be considered. TLSA performs simulated annealing in two levels with strengthened local optimizer, and is less likely to get tracked at local optimizer. We apply TLSA to the Bayesian structure learning problem. We build up a golden network (GN) with 10, 20, 30 and 40 nodes respectively and random datasets are generated from GNs using the Monte Carlo method. We use the resultant sampled data to test Bayesian scores that infer the strength of learning network structures in the design of the TLSA algorithm. The simulation is performed on 10 networks for each of the 4 groups of datasets. Our experimental results demonstrate that TLSA performs better than other Bayesian structure learning algorithm K2, and better scores are achieved in all tested cases.

Return to Poster Abstract Index
Return to Top

HOME • REGISTRATION • PAPERS • POSTERS • TUTORIALS • PROGRAM • KEYNOTE SPEAKERS • INVITED SPEAKERS
SPECIAL EVENTS • COMMITTEES • SPONSORS • NEWS ROOM • CONTACT US • PREVIOUS CONFERENCES