CSB2004

POSTER ABSTRACTS:
PROTEIN STRUCTURE ANALYSIS

Computational Analysis and Classification of p53 Mutants According to Primary Structure
Krishna Gopalakrishnan, Alireza Darvish, Kayvan Najarian
University of North Carolina At Charlotte

There is a pressing need for fast and accurate methods for classification and analysis of mutated proteins. Single base mutation may or may not result in a change in structure and function of wild type. Widely used multiple alignment based classification techniques can give a false result in such cases as the primary sequence of mutants and that of the wild types are very similar. We present a signal processing based technique for the classification and analysis of proteins produced by mutation of the wild type. The proposed technique uses signal processing methods along with biochemical properties of individual amino acids for the analysis. Each amino acid in the resultant protein from mutation is replaced with the corresponding biochemical properties like molecular weight, hydrophobicity, etc. Amino acid substitution with the corresponding biochemical properties generates a set of biochemical signals which are used to extract signal processing features like complexity, mobility, fractal dimension, and wavelet transformation. In an experimental study of p53 protein, mutants resulting from single mutation of eight residue of the β-strand 326-33 to alanine were analyzed for their ability to stimulate transcription, to inhibit the growth of Saos-2 cells, and to repress the promoter of multidrug resistance gene. The results obtained by our computational technique produce three clusters. The first cluster contains mutants L330A and I332A, second contains F328A, and the last contains E326A, Y327A, T329A, Q331A, and R333A. Our classification results, merely based on the analysis of primary sequences, are matching with those of the experiential studies.

In Silico Prediction of Surface Residue Clusters for Enzyme-substrate Specificity
Gong-Xin Yu, Byung-Hoon Park, Praveen Chandramohan, Rajesh
Computational Biology Institute, Oak Ridge National Laboratory

One of the most remarkable properties of enzyme-substrate binding is the high substrate specificity among highly homologous enzymes. Identification of key residues or their clusters for substrate recognition presents an opportunity to understand their basic molecular mechanisms and guide mutagenesis experiments into relevant residues and, thus, accelerate progress in bioprocess engineering and drug design. We reason that residues involved in such recognition are most likely clustered on a protein surface and involve possible interactions among their neighboring residues. We report a computational procedure that predicts such clusters of specificity-determining residues among highly homologous functional protein groups. Current methods identify conserved residues but largely ignore non-conserved residues and their potential contributions. Our method has the ability to overcome those limitations. In case studies, we have investigated two highly homologous enzymatic protein pairs (called as functional sub-types): Guanylyl cyclases vs. adenylyl cyclases and lactate dehydrogenase vs. malate dehydrogenases, and applied this algorithm to plant and cyano-bacterial RuBisCo protein complexes, which differ dramatically in the CO2/O2 specificity. Without using experimental data, we identified mono-residue clusters as well as multi-residue ones and obtained a considerable concurrence with experimental results. Specifically, some of the identified clusters, primarily the mono-residue ones, can cover residues that are directly involved in substrate-enzyme interactions. Others, mainly multi-residue ones, cover residues vital for domain-domain and regulator-enzyme interactions, indicating potential roles of those function non-specific yet complementary residues in the specificity determination.

Return to Poster Abstract Index
Return to Top

Comparing 3D Protein Structures Similarity by Using Fractal Features
Chenyang Cui, Donghui Wang, Yingsha Zhang, and Jiaoying Shi
Donald Danforth Plant Science Center

In this paper, we propose a new method for finding similarity in 3-D protein structure comparison. Different from the other existing methods, our method is grounded in the theory of fractal geometry, since proteins have an intrinsic self-similarity in the compactness and the packing of their structure. Three fractal features of the protein backbone are proposed, these features are invariant to the rotation, translation, scaling of the protein molecule, and it is simple to implement. The method is very fast because it requires neither alignment of the chains nor any chain-chain comparison. The experimental result shows that our method is very effective in classification of 3-D protein structures and suitable for the global matching of 3-D protein structures. Compared with the Dali server, within the same group we can obtain similar results with much simplicity and efficiency.

Return to Poster Abstract Index
Return to Top

Structural Analysis of FGFR1 Kinase Activation through Molecular Dynamics Simulation
Peng Wang, Zhengchang Su, Juntao Guo, and Ying Xu
Computational System Biology Lab, UGA

Fibroblast growth factor receptors (FGFR) are receptor tyrosine kinases that are critical regulators of signal transduction pathways mediating cellular homeostasis. Constitutively active form of FGFRs generated via mutation, gene fusion and other genetic alternations have been observed in many human cancers. The crystal structure of FGFR1 suggested that FGFR1 existed in equilibrium of active and inhibiting conformations, and served as the basis for activation upon dimerization. We have performed a 4 ns molecular dynamics (MD) simulation of the kinase domain of FGFR1 to study the mechanism that regulated its movement towards active conformation. Simulation was performed with NAMD using CHARMM22 force field and NPT ensemble. The particle mesh Ewald (PME) method was used to treat long-range electrostatic interactions. Our simulation revealed that the activation loop moved away from its inhibitive conformation and adopted an open conformation about 2 ns into the simulation. The c-terminus of activation loop rotated about 90 degree to open up the kinase cavity for substrate access. The main interaction that brought about this movement was the hydrogen bond between D652OD2 and T657OG1. Our study has clarified the key atomic events that trigger the movement of activation loop. This dynamic information will facilitate the design of new inhibitors for the treatment of cancer.

Return to Poster Abstract Index
Return to Top

Automatic Prediction of Functional Site Regions in Low Resolution Protein Structures
J.S Sodhi, K. Bryson, L. J. McGuffin, J.J Ward, L. Wernisch and D.T Jones
University College London, Bioinformatics Group

World-wide structural genomics initiatives are rapidly accumulating structures for which limited functional information is available. Additionally, state-of-the-art structural prediction programs are now capable of generating at least low resolution structural models of target proteins. Accurate detection and classification of functional sites within both solved and modelled protein structures therefore represents an important challenge. We present a fully automatic site detection method, FuncSite, that uses neural network classifiers to predict the location and type of functionally important sites in protein structures. The method is designed primarily to require only relative residue position without the need for specific side-chain atoms to be present. The functional site encoding represents conservation using PSI-BLAST PSSMs of site residues as well as solvent accessibility and secondary structure assignments. We have rigorously benchmarked FuncSite on a set of metal binding sites spanning numerous SCOP super-families. The method has also been extended to the prediction of protein-DNA interface regions, adenylate classification and the identification of enzyme active sites. In order to highlight effective site detection in low resolution structural models FuncSite was used to screen model proteins generated using mGenTHREADER on a set of newly released structures. We found effective metal site detection even for moderate quality protein models illustrating the robustness of the method. We have also investigated the use of site detection to improve fold recognition predictions. Analysis on a set of structures from LiveBench, an on-going assessment of structure prediction methods, indicates statistically significant improvements.

Return to Poster Abstract Index
Return to Top

BPAP: A Computational Tool for Whole Genome Annotation and Analysis
Barrett Abel and Martin Gollery
University of Nevada, Reno

We have created a Biological Protein Analysis Pipeline (BPAP) in order to provide improved annotation for Genome Projects with a minimum of technical complexity. BPAP is designed as an extensible and expandable software package to handle the non-scientific complexity of interfacing with hardware, computational clusters, software packages, raw data, and file formats, thereby leaving the scientist a powerful and simple interface to the analysis and mining of genomic information. Due to the computationally expensive nature of component analyses and predictions, many being O(n2) or O(n3) complexity, we enabled the analysis to be performed in a massively parallel fashion. In order to accommodate this effectively, we integrated BPAP with Beowulf / Computational Clusters using open sourced scheduling managers, currently MAUI and SGE. The user interface was designed to run on a desktop computer (see snapshots) and remotely control the cluster for processing. The computational/ processing component of the software package was designed to interface transparently with the GUI, hiding the unix/ computational complexity from the user. BPAP was written in C++ / C / perl with the QT windowing toolkit that allows the software package to run on Windows, Linux (64/32bit), Mac OS X, and IRIX. The resulting annotation data is interpreted by computer and presented in a useful manner, rather than a conglomeration of separate analyses. As a first use of BPAP, we will be annotating the complete Plasmodium falciparum [strain 3D7] genome, annotating 11,438 hypothetical and unknown genes. We plan to extend the functionality of BPAP to searching a genome for specific criteria (i.e., belongs to a LEA family, etc..) and the implementation of several algorithms, particularly using publicly available databases to infer homology relationships.

Return to Poster Abstract Index
Return to Top

Secondary Structure Assignment Based on the Delaunay Tessellation of Protein Structures
Todd J. Taylor and Iosif Vaisman
George Mason University

Protein structures have been analyzed with a geometrical construction known as the Delaunay tessellation. Each amino acid is abstracted to a point and these points are then joined by edges in a unique way to form a set of non-overlapping, irregular, space-filling tetrahedra. A five element descriptor derived from the Delaunay tessellation can then be assigned to each residue in the protein which characterizes main chain topology in the neighborhood of that residue. Rules which accurately map this descriptor to the DSSP secondary structure assignment can be devised. We have created several such mappings and compared the degree of agreement with other existing methods of secondary structure assignment such as STRIDE, P-SEA, SECSTR, DEFINE and XTLSSTR. Agreement of tessellation based secondary structure assignment with DSSP is comparable to existing methods (~90% for helices and ~80% for strands). This is remarkable because the descriptor is based solely on carbon alpha backbone connectivity/topology. No angles, lengths, or putative hydrogen bonds are used to derive it.

Return to Poster Abstract Index
Return to Top

Molecular Modeling of Full-length OxyR from Shewanella oneidensis MR-1 and Molecular Dynamics Studies of the Activation Domain
Jun-tao Guo and Ying Xu
University of Georgia

The OxyR protein, first identified as a key regulator of the peroxide stress response in Salmonella typhimurium, is found in many prokaryotic organisms. OxyR belongs to one of the largest families of prokaryotic DNA binding proteins, the LysR-type transcriptional regulators (LTTRs). LTTR family proteins have an N-terminal DNA binding domain and C-terminal activation domain. OxyR is referred as an archetypal example of the redox regulatory protein. It is activated by oxidation of H2O2 and then induces the transcription of genes necessary for the bacterial defense against oxidative stress. In the oxidized form, an intramolecular disulfide bond between Cys-199 and Cys-208 is formed after activation by H2O2. Although OxyR has been studied extensively for many years, the exact mechanism is still not clear. In this study, we constructed a structural model for the full-length OxyR from Shewanella oneidensis MR-1 using threading and comparative modeling techniques. We also performed molecular dynamic simulations on the activation domain of OxyR. Molecular dynamics simulations were performed using GROMACS force field under periodic boundary conditions. The Particle Mesh Ewald (PME) method was used to treat long-range electrostatic interactions. The simulation results show that the oxidized form is very stable while the reduced form is quite flexible. Our results suggest that the reduced form provides structural flexibility for disulfide bond formation and which in turn regulates its function.

Return to Poster Abstract Index
Return to Top

A Combinatorial Method for Protein Loop Prediction
Chiuan-Jung Chen, Jinn-Moon Yang, and Cheng-Yan Kao

A major limitation of current comparative modeling method is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled, and we call this kind of problem loop modeling problem. There are two difficulties of loop modeling problem; the first one is how to generate a feasible conformation of loop, and the second on is to find the closest conformation to the native one. Here we present a method which combine several algorithms to solve these two difficulties describing above. For solving the difficulty 1, a robotics algorithm for protein loop closure called CCD (Cyclic Coordinate Descent) is adopted, which iteratively change the phi or phi angles from the start residue to the end residue to make the conformation close. For solving the difficulty 2, we use a formalism to compute the probability of an amino acid sequence conformation being native-like, given a set of pairwise atom-atom distance, and a search strategy FCEA to find the conformation with best probability from the random building conformations. We evaluate this method by predict only the backbone conformations of two loops, 3BLM 131-135 and 8TLN 248-255. The result shows that we can get a good RMSD to the native loop conformation, (0.3 A and 1.1 A, respectively), and much less computational time than other loop prediction methods (only need 618 seconds and 1302 seconds, respectively).

Return to Poster Abstract Index
Return to Top

Prediction of Functional Sites by Analysis of Sequence and Structure Conservation
Anna R. Panchenko, Fyodor Kondrashov, and Stephen Bryant
NCBI, NIH

The recent growth in the number of protein sequence families requires new methods of detailed functional annotation. We present a method for prediction of functional sites in a set of aligned protein sequences. The method selects sites which are both well conserved and clustered together in space, as inferred from the 3D structures of proteins included in the alignment. We test the method using 86 alignments from the NCBI CDD database, where the sites of experimentally determined ligand and/or macromolecular interactions are annotated. In agreement with earlier investigations, we find that functional site predictions are most successful when overall background sequence conservation is low, such that sites under evolutionary constraint become apparent. In addition, we find that averaging of conservation values across spatially clustered sites improves predictions under certain conditions: When overall conservation is relatively high and when the site in question involves a large macromolecular binding interface. Under these conditions it is better to look for clusters of conserved sites than to look for particular conserved sites.

Return to Poster Abstract Index
Return to Top

A Novel Computational Framework for Structural Classification of Proteins Using Local Geometric Parameter Matching
Sumeet Dua, Naveen Kandiraju, and Vineet Jain
Data Mining Research Laboratory, Louisiana Tech University

The objective of this study was to develop a novel and fast computational framework for classification of proteins using a series of secondary structure geometric parameter represented by an unexplored dihedral angle of a protein sequence. Methods: A dihedral angle is calculated between two planes represented by atom-tuplets [N(i), C(i), N(i+1)] and [C(i), N(i+1), C(i+1)], of adjacent (i and i+1) amino acids of a protein structure. Series of such angles are segmented into overlapping subsequences followed by the identification of the areas of relatively stationary harmonic behavior (called trails). These trails are then structured in a unique translational and scale invariant indexing schema to enable searching and reporting of local alignments. Results: The technique is tested over 25 proteins belonging to 5 different families randomly selected from Alpha, Beta, Alpha and Beta (alpha/beta) and Multi-domain proteins (alpha and beta) classes. Degree of local similarity is calculated using our indexing schema, and the results represented with approximate positional information of the similitude match. The experimental results demonstrate a cumulative true positive rate of 88% in classification, with a very low degree of false negatives. The degrees of proximity of false negatives are also demonstrated, to reveal the robustness of the proposed technique. Conclusions: The proposed computational framework for the local alignment of two sequences can serve as a good classifier for protein sequences in their respective families. The approach has achieved multitude reduction in dimensionality of the similarity search space, with a high degree of accuracy in protein structural classification.

Return to Poster Abstract Index
Return to Top

Large-scale Testing of Chemical Shift Prediction Algorithms and Improved Machine Learning-based Approaches to Shift Prediction
K. Arun and Christopher J. Langmead
School of Computer Science / Biological Sciences, Carnegie Mellon University

Nuclear chemical shifts in proteins are determined by their covalent structure, through-space interactions, and more generally their three-dimensional structures. While the correlation between chemical shift and protein 3D structure remains unsolved, the dependency of shift on structure mentioned above also makes its prediction a non-trivial problem. In this study, three existing chemical shift prediction algorithms are tested against a large dataset of shifts obtained from the RefDB chemical shift database. RefDB entries were linked to corresponding protein structures from the Protein Data Bank (PDB), which were processed through each of the three chemical shift predictors, SHIFTS, SHIFTX and PROSHIFT. These predicted shift values were matched up with the matching experimentally observed shifts and root mean square error (RMSE) values were calculated per atom type. Atom types that were evaluated include the amide nitrogen (15N) and proton (HN), the C-alpha carbon and the alpha proton. The numbers of shift values employed for each atom type were 72,000, 49,000, 44,000 and 60,000 respectively. Notable results include the observation that RMSE values were higher than those reported in the original papers, across atom type and for each of the predictors used. This may be accounted for by the fact that the dataset of chemical shifts employed in this study is much larger than the originals sets of shifts used to benchmark the prediction algorithms. A support-vector machine (SVM) based approach was then employed to try and improve upon the accuracy of shift predictions observed with the three algorithms for HN and 15N nuclei. Training and predicting on the full dataset for both 15N and HN resulted in statistically significant improvements in RMSE to the extent of 10% over the most accurate individual prediction algorithm. More elaborate tuning of the SVM parameters should further improve the accuracy of chemical shift prediction.

Return to Poster Abstract Index
Return to Top

HOME • REGISTRATION • PAPERS • POSTERS • TUTORIALS • PROGRAM • KEYNOTE SPEAKERS • INVITED SPEAKERS
SPECIAL EVENTS • COMMITTEES • SPONSORS • NEWS ROOM • CONTACT US • PREVIOUS CONFERENCES