Computational Analysis and Classification of p53 Mutants According to Primary Structure
Krishna Gopalakrishnan, Alireza Darvish, Kayvan Najarian
University of North Carolina At Charlotte
There is a pressing need for fast and accurate methods for classification and analysis of mutated proteins.
Single base mutation may or may not result in a change in structure and function of wild type. Widely used
multiple alignment based classification techniques can give a false result in such cases as the primary
sequence of mutants and that of the wild types are very similar. We present a signal processing based technique
for the classification and analysis of proteins produced by mutation of the wild type. The proposed technique
uses signal processing methods along with biochemical properties of individual amino acids for the analysis.
Each amino acid in the resultant protein from mutation is replaced with the corresponding biochemical properties
like molecular weight, hydrophobicity, etc. Amino acid substitution with the corresponding biochemical properties
generates a set of biochemical signals which are used to extract signal processing features like complexity,
mobility, fractal dimension, and wavelet transformation. In an experimental study of p53 protein, mutants
resulting from single mutation of eight residue of the β-strand 326-33 to alanine were analyzed for their
ability to stimulate transcription, to inhibit the growth of Saos-2 cells, and to repress the promoter of
multidrug resistance gene. The results obtained by our computational technique produce three clusters. The first
cluster contains mutants L330A and I332A, second contains F328A, and the last contains E326A, Y327A, T329A, Q331A,
and R333A. Our classification results, merely based on the analysis of primary sequences, are matching with those
of the experiential studies.
In Silico Prediction of Surface Residue Clusters for Enzyme-substrate Specificity
Gong-Xin Yu, Byung-Hoon Park, Praveen Chandramohan, Rajesh
Computational Biology Institute, Oak Ridge National Laboratory
One of the most remarkable properties of enzyme-substrate binding is the high substrate specificity
among highly homologous enzymes. Identification of key residues or their clusters for substrate recognition
presents an opportunity to understand their basic molecular mechanisms and guide mutagenesis experiments into
relevant residues and, thus, accelerate progress in bioprocess engineering and drug design. We reason that
residues involved in such recognition are most likely clustered on a protein surface and involve possible
interactions among their neighboring residues. We report a computational procedure that predicts such clusters
of specificity-determining residues among highly homologous functional protein groups. Current methods identify
conserved residues but largely ignore non-conserved residues and their potential contributions. Our method has
the ability to overcome those limitations. In case studies, we have investigated two highly homologous enzymatic
protein pairs (called as functional sub-types): Guanylyl cyclases vs. adenylyl cyclases and lactate dehydrogenase
vs. malate dehydrogenases, and applied this algorithm to plant and cyano-bacterial RuBisCo protein complexes,
which differ dramatically in the CO2/O2 specificity. Without using experimental data, we identified mono-residue
clusters as well as multi-residue ones and obtained a considerable concurrence with experimental results.
Specifically, some of the identified clusters, primarily the mono-residue ones, can cover residues that are
directly involved in substrate-enzyme interactions. Others, mainly multi-residue ones, cover residues vital for
domain-domain and regulator-enzyme interactions, indicating potential roles of those function non-specific yet
complementary residues in the specificity determination.
Return to Poster Abstract Index
Return to Top
Comparing 3D Protein Structures Similarity by Using Fractal Features
Chenyang Cui, Donghui Wang, Yingsha Zhang, and Jiaoying Shi
Donald Danforth Plant Science Center
In this paper, we propose a new method for finding similarity in 3-D protein structure comparison. Different
from the other existing methods, our method is grounded in the theory of fractal geometry, since proteins
have an intrinsic self-similarity in the compactness and the packing of their structure. Three fractal
features of the protein backbone are proposed, these features are invariant to the rotation, translation,
scaling of the protein molecule, and it is simple to implement. The method is very fast because it requires
neither alignment of the chains nor any chain-chain comparison. The experimental result shows that our method
is very effective in classification of 3-D protein structures and suitable for the global matching of 3-D protein
structures. Compared with the Dali server, within the same group we can obtain similar results with much
simplicity and efficiency.
Return to Poster Abstract Index
Return to Top
Structural Analysis of FGFR1 Kinase Activation through Molecular Dynamics Simulation
Peng Wang, Zhengchang Su, Juntao Guo, and Ying Xu
Computational System Biology Lab, UGA
Fibroblast growth factor receptors (FGFR) are receptor tyrosine kinases that are critical regulators of
signal transduction pathways mediating cellular homeostasis. Constitutively active form of FGFRs generated
via mutation, gene fusion and other genetic alternations have been observed in many human cancers. The crystal
structure of FGFR1 suggested that FGFR1 existed in equilibrium of active and inhibiting conformations, and
served as the basis for activation upon dimerization. We have performed a 4 ns molecular dynamics (MD)
simulation of the kinase domain of FGFR1 to study the mechanism that regulated its movement towards active
conformation. Simulation was performed with NAMD using CHARMM22 force field and NPT ensemble. The particle
mesh Ewald (PME) method was used to treat long-range electrostatic interactions. Our simulation revealed that
the activation loop moved away from its inhibitive conformation and adopted an open conformation about 2 ns
into the simulation. The c-terminus of activation loop rotated about 90 degree to open up the kinase cavity
for substrate access. The main interaction that brought about this movement was the hydrogen bond between
D652OD2 and T657OG1. Our study has clarified the key atomic events that trigger the movement of activation
loop. This dynamic information will facilitate the design of new inhibitors for the treatment of cancer.
Return to Poster Abstract Index
Return to Top
Automatic Prediction of Functional Site Regions in Low Resolution Protein Structures
J.S Sodhi, K. Bryson, L. J. McGuffin, J.J Ward, L. Wernisch and D.T Jones
University College London, Bioinformatics Group
World-wide structural genomics initiatives are rapidly accumulating structures for which limited functional
information is available. Additionally, state-of-the-art structural prediction programs are now capable of
generating at least low resolution structural models of target proteins. Accurate detection and classification of
functional sites within both solved and modelled protein structures therefore represents an important challenge.
We present a fully automatic site detection method, FuncSite, that uses neural network classifiers to predict the
location and type of functionally important sites in protein structures. The method is designed primarily to
require only relative residue position without the need for specific side-chain atoms to be present. The functional
site encoding represents conservation using PSI-BLAST PSSMs of site residues as well as solvent accessibility and
secondary structure assignments. We have rigorously benchmarked FuncSite on a set of metal binding sites spanning
numerous SCOP super-families. The method has also been extended to the prediction of protein-DNA interface regions,
adenylate classification and the identification of enzyme active sites. In order to highlight effective site
detection in low resolution structural models FuncSite was used to screen model proteins generated using
mGenTHREADER on a set of newly released structures. We found effective metal site detection even for moderate
quality protein models illustrating the robustness of the method. We have also investigated the use of site
detection to improve fold recognition predictions. Analysis on a set of structures from LiveBench, an on-going
assessment of structure prediction methods, indicates statistically significant improvements.
Return to Poster Abstract Index
Return to Top
BPAP: A Computational Tool for Whole Genome Annotation and Analysis
Barrett Abel and Martin Gollery
University of Nevada, Reno
We have created a Biological Protein Analysis Pipeline (BPAP) in order to provide improved annotation for
Genome Projects with a minimum of technical complexity. BPAP is designed as an extensible and expandable
software package to handle the non-scientific complexity of interfacing with hardware, computational
clusters, software packages, raw data, and file formats, thereby leaving the scientist a powerful and
simple interface to the analysis and mining of genomic information. Due to the computationally expensive
nature of component analyses and predictions, many being O(n2) or O(n3) complexity, we enabled the analysis
to be performed in a massively parallel fashion. In order to accommodate this effectively, we integrated BPAP
with Beowulf / Computational Clusters using open sourced scheduling managers, currently MAUI and SGE. The user
interface was designed to run on a desktop computer (see snapshots) and remotely control the cluster
for processing. The computational/ processing component of the software package was designed to interface
transparently with the GUI, hiding the unix/ computational complexity from the user. BPAP was written in
C++ / C / perl with the QT windowing toolkit that allows the software package to run on Windows, Linux (64/32bit),
Mac OS X, and IRIX. The resulting annotation data is interpreted by computer and presented in a useful manner,
rather than a conglomeration of separate analyses. As a first use of BPAP, we will be annotating the complete
Plasmodium falciparum [strain 3D7] genome, annotating 11,438 hypothetical and unknown genes. We plan to extend
the functionality of BPAP to searching a genome for specific criteria (i.e., belongs to a LEA family, etc..)
and the implementation of several algorithms, particularly using publicly available databases to infer homology
relationships.
Return to Poster Abstract Index
Return to Top
Secondary Structure Assignment Based on the Delaunay Tessellation of Protein Structures
Todd J. Taylor and Iosif Vaisman
George Mason University
Protein structures have been analyzed with a geometrical construction known as the Delaunay tessellation.
Each amino acid is abstracted to a point and these points are then joined by edges in a unique way to form
a set of non-overlapping, irregular, space-filling tetrahedra. A five element descriptor derived from the
Delaunay tessellation can then be assigned to each residue in the protein which characterizes main chain
topology in the neighborhood of that residue. Rules which accurately map this descriptor to the DSSP secondary
structure assignment can be devised. We have created several such mappings and compared the degree of
agreement with other existing methods of secondary structure assignment such as STRIDE, P-SEA, SECSTR, DEFINE
and XTLSSTR. Agreement of tessellation based secondary structure assignment with DSSP is comparable to existing
methods (~90% for helices and ~80% for strands). This is remarkable because the descriptor is based solely on
carbon alpha backbone connectivity/topology. No angles, lengths, or putative hydrogen bonds are used to derive it.
Return to Poster Abstract Index
Return to Top
Molecular Modeling of Full-length OxyR from Shewanella oneidensis MR-1 and Molecular Dynamics Studies
of the Activation Domain
Jun-tao Guo and Ying Xu
University of Georgia
The OxyR protein, first identified as a key regulator of the peroxide stress response in Salmonella
typhimurium, is found in many prokaryotic organisms. OxyR belongs to one of the largest families of
prokaryotic DNA binding proteins, the LysR-type transcriptional regulators (LTTRs). LTTR family proteins
have an N-terminal DNA binding domain and C-terminal activation domain. OxyR is referred as an archetypal
example of the redox regulatory protein. It is activated by oxidation of H2O2 and then induces the
transcription of genes necessary for the bacterial defense against oxidative stress. In the oxidized form,
an intramolecular disulfide bond between Cys-199 and Cys-208 is formed after activation by H2O2.
Although OxyR has been studied extensively for many years, the exact mechanism is still not clear. In this study,
we constructed a structural model for the full-length OxyR from Shewanella oneidensis MR-1 using threading and
comparative modeling techniques. We also performed molecular dynamic simulations on the activation domain of
OxyR. Molecular dynamics simulations were performed using GROMACS force field under periodic boundary conditions.
The Particle Mesh Ewald (PME) method was used to treat long-range electrostatic interactions. The simulation
results show that the oxidized form is very stable while the reduced form is quite flexible. Our results suggest
that the reduced form provides structural flexibility for disulfide bond formation and which in turn regulates
its function.
Return to Poster Abstract Index
Return to Top
A Combinatorial Method for Protein Loop Prediction
Chiuan-Jung Chen, Jinn-Moon Yang, and Cheng-Yan Kao
A major limitation of current comparative modeling method is the accuracy with which regions that are
structurally divergent from homologues of known structure can be modeled, and we call this kind of problem
loop modeling problem. There are two difficulties of loop modeling problem; the first one is how to
generate a feasible conformation of loop, and the second on is to find the closest conformation to the
native one. Here we present a method which combine several algorithms to solve these two difficulties
describing above. For solving the difficulty 1, a robotics algorithm for protein loop closure called CCD
(Cyclic Coordinate Descent) is adopted, which iteratively change the phi or phi angles from
the start residue to the end residue to make the conformation close. For solving the difficulty 2, we use a
formalism to compute the probability of an amino acid sequence conformation being native-like, given a set of
pairwise atom-atom distance, and a search strategy FCEA to find the conformation with best probability from
the random building conformations. We evaluate this method by predict only the backbone conformations of two
loops, 3BLM 131-135 and 8TLN 248-255. The result shows that we can get a good RMSD to the native loop
conformation, (0.3 A and 1.1 A, respectively), and much less computational time than other
loop prediction methods (only need 618 seconds and 1302 seconds, respectively).
Return to Poster Abstract Index
Return to Top
Prediction of Functional Sites by Analysis of Sequence and Structure Conservation
Anna R. Panchenko, Fyodor Kondrashov, and Stephen Bryant
NCBI, NIH
The recent growth in the number of protein sequence families requires new methods of detailed functional
annotation. We present a method for prediction of functional sites in a set of aligned protein sequences.
The method selects sites which are both well conserved and clustered together in space, as inferred from
the 3D structures of proteins included in the alignment. We test the method using 86 alignments from the
NCBI CDD database, where the sites of experimentally determined ligand and/or macromolecular interactions
are annotated. In agreement with earlier investigations, we find that functional site predictions are most
successful when overall background sequence conservation is low, such that sites under evolutionary
constraint become apparent. In addition, we find that averaging of conservation values across spatially
clustered sites improves predictions under certain conditions: When overall conservation is relatively high
and when the site in question involves a large macromolecular binding interface. Under these conditions it
is better to look for clusters of conserved sites than to look for particular conserved sites.
Return to Poster Abstract Index
Return to Top
A Novel Computational Framework for Structural Classification of Proteins Using Local Geometric Parameter Matching
Sumeet Dua, Naveen Kandiraju, and Vineet Jain
Data Mining Research Laboratory, Louisiana Tech University
The objective of this study was to develop a novel and fast computational framework for classification of
proteins using a series of secondary structure geometric parameter represented by an unexplored dihedral
angle of a protein sequence. Methods: A dihedral angle is calculated between two planes represented by
atom-tuplets [N(i), C(i), N(i+1)] and [C(i), N(i+1), C(i+1)], of adjacent (i and i+1) amino acids of a protein
structure. Series of such angles are segmented into overlapping subsequences followed by the identification of
the areas of relatively stationary harmonic behavior (called trails). These trails are then structured in a
unique translational and scale invariant indexing schema to enable searching and reporting of local alignments.
Results: The technique is tested over 25 proteins belonging to 5 different families randomly selected from Alpha,
Beta, Alpha and Beta (alpha/beta) and Multi-domain proteins (alpha and beta) classes. Degree of local similarity
is calculated using our indexing schema, and the results represented with approximate positional information of
the similitude match. The experimental results demonstrate a cumulative true positive rate of 88% in classification,
with a very low degree of false negatives. The degrees of proximity of false negatives are also demonstrated, to
reveal the robustness of the proposed technique. Conclusions: The proposed computational framework for the local
alignment of two sequences can serve as a good classifier for protein sequences in their respective families. The
approach has achieved multitude reduction in dimensionality of the similarity search space, with a high degree of
accuracy in protein structural classification.
Return to Poster Abstract Index
Return to Top
Large-scale Testing of Chemical Shift Prediction Algorithms and Improved Machine Learning-based Approaches
to Shift Prediction
K. Arun and Christopher J. Langmead
School of Computer Science / Biological Sciences, Carnegie Mellon University
Nuclear chemical shifts in proteins are determined by their covalent structure, through-space interactions,
and more generally their three-dimensional structures. While the correlation between chemical shift and protein
3D structure remains unsolved, the dependency of shift on structure mentioned above also makes its prediction
a non-trivial problem. In this study, three existing chemical shift prediction algorithms are tested against
a large dataset of shifts obtained from the RefDB chemical shift database. RefDB entries were linked to
corresponding protein structures from the Protein Data Bank (PDB), which were processed through each of the
three chemical shift predictors, SHIFTS, SHIFTX and PROSHIFT. These predicted shift values were matched up
with the matching experimentally observed shifts and root mean square error (RMSE) values were calculated
per atom type. Atom types that were evaluated include the amide nitrogen (15N) and proton (HN), the C-alpha
carbon and the alpha proton. The numbers of shift values employed for each atom type were 72,000, 49,000,
44,000 and 60,000 respectively. Notable results include the observation that RMSE values were higher than
those reported in the original papers, across atom type and for each of the predictors used. This may be
accounted for by the fact that the dataset of chemical shifts employed in this study is much larger than
the originals sets of shifts used to benchmark the prediction algorithms. A support-vector machine (SVM)
based approach was then employed to try and improve upon the accuracy of shift predictions observed with the
three algorithms for HN and 15N nuclei. Training and predicting on the full dataset for both 15N and HN resulted
in statistically significant improvements in RMSE to the extent of 10% over the most accurate individual
prediction algorithm. More elaborate tuning of the SVM parameters should further improve the accuracy of
chemical shift prediction.
Return to Poster Abstract Index
Return to Top
|