Mapping the Sequence-Structure-Function Paradigm by Intrinsic Properties of Anisotropic Networks

161669-Thumbnail Image.png
Description
Proteins are the machines of living systems that carry out a diverse set of essential biochemical functions. Furthermore, the diversity of their functions has grown overtime via molecular evolution. This thesis aims to explore fundamental questions in protein science regarding

Proteins are the machines of living systems that carry out a diverse set of essential biochemical functions. Furthermore, the diversity of their functions has grown overtime via molecular evolution. This thesis aims to explore fundamental questions in protein science regarding the mechanisms of protein evolution particularly addressing how substitutions in sequence modulate function through structure and structural dynamics. In the work presented here, the first goal is to develop a set of tools which connect the sequence-structure relationship which are implemented in two major projects of protein structural refinement and protein structural design. Both of these two works highlight the importance of capturing important pairwise interactions within a given protein system.The second major goal of this work is to understand how sequence and structural dynamics give rise to protein function, and, importantly, how Nature can utilize allostery to evolve towards a new function. Here I employ several in-house and novel computational tools to shed light onto the mechanisms of allostery, and, particularly dynamic allostery in the absence of structural rearrangements. This analysis is applied to several different protein systems including Pin1, LacI, CoV-1 and CoV-2 and TEM-1. I show that the dynamics of protein systems may be altered fundamentally by distal perturbations such as ligand binding or point mutations. These peturbations lead to change in local interactions which cascade within the 3-D network of interaction of a protein and give rise to flexibility changes of distal sites, particularly those of functional/active residues positions thereby altering the protein function. This networking picture of the protein is further explored through asymmetric dynamic coupling which shows to be a marker of allosteric interactions between distal residue pairs. Within the networking picture, the concept of sequence context dependence upon mutation becomes critical in understanding the functional outcome of these mutations. Here I design a computational tool, EpiScore, which is able to capture these effects and correlate them to measured experimental epistasis in two protein systems, dihydrofolate reductase (DHFR) and TEM-1. Ultimately, the work provided in this thesis shows that both allostery and epistasis may be considered, and accurately modeled, as intrinsic properties of anisotropic networks.
Date Created
2021
Agent

The Database of Macromolecular Motions: New Features Added at the Decade Mark

127993-Thumbnail Image.png
Description

The database of molecular motions, MolMovDB (http://molmovdb.org), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two conformations (the Morph Server) and predict possible motions in a single structure. In 2005,

The database of molecular motions, MolMovDB (http://molmovdb.org), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two conformations (the Morph Server) and predict possible motions in a single structure. In 2005, we expanded the services offered on MolMovDB. In particular, we further developed the Morph Server to produce improved interpolations between two submitted structures. We added support for multiple chains to the original adiabatic mapping interpolation, allowing the analysis of subunit motions. We also added the option of using FRODA interpolation, which allows for more complex pathways, potentially overcoming steric barriers. We added an interface to a hinge prediction service, which acts on single structures and predicts likely residue points for flexibility. We developed tools to relate such points of flexibility in a structure to particular key residue positions, i.e. active sites or highly conserved positions. Lastly, we began relating our motion classification scheme to function using descriptions from the Gene Ontology Consortium.

Date Created
2006-01-01
Agent

Collective Dynamics Differentiates Functional Divergence in Protein Evolution

128063-Thumbnail Image.png
Description

Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic

Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function.

Date Created
2012-03-29
Agent

Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways

128621-Thumbnail Image.png
Description

Diverse classes of proteins function through large-scale conformational changes and various sophisticated computational algorithms have been proposed to enhance sampling of these macromolecular transition paths. Because such paths are curves in a high-dimensional space, it has been difficult to quantitatively

Diverse classes of proteins function through large-scale conformational changes and various sophisticated computational algorithms have been proposed to enhance sampling of these macromolecular transition paths. Because such paths are curves in a high-dimensional space, it has been difficult to quantitatively compare multiple paths, a necessary prerequisite to, for instance, assess the quality of different algorithms. We introduce a method named Path Similarity Analysis (PSA) that enables us to quantify the similarity between two arbitrary paths and extract the atomic-scale determinants responsible for their differences. PSA utilizes the full information available in 3N-dimensional configuration space trajectories by employing the Hausdorff or Fréchet metrics (adopted from computational geometry) to quantify the degree of similarity between piecewise-linear curves. It thus completely avoids relying on projections into low dimensional spaces, as used in traditional approaches.

To elucidate the principles of PSA, we quantified the effect of path roughness induced by thermal fluctuations using a toy model system. Using, as an example, the closed-to-open transitions of the enzyme adenylate kinase (AdK) in its substrate-free form, we compared a range of protein transition path-generating algorithms. Molecular dynamics-based dynamic importance sampling (DIMS) MD and targeted MD (TMD) and the purely geometric FRODA (Framework Rigidity Optimized Dynamics Algorithm) were tested along with seven other methods publicly available on servers, including several based on the popular elastic network model (ENM). PSA with clustering revealed that paths produced by a given method are more similar to each other than to those from another method and, for instance, that the ENM-based methods produced relatively similar paths. PSA applied to ensembles of DIMS MD and FRODA trajectories of the conformational transition of diphtheria toxin, a particularly challenging example, showed that the geometry-based FRODA occasionally sampled the pathway space of force field-based DIMS MD. For the AdK transition, the new concept of a Hausdorff-pair map enabled us to extract the molecular structural determinants responsible for differences in pathways, namely a set of conserved salt bridges whose charge-charge interactions are fully modelled in DIMS MD but not in FRODA. PSA has the potential to enhance our understanding of transition path sampling methods, validate them, and to provide a new approach to analyzing conformational transitions.

Date Created
2015-10-21
Agent

Modeling Vitreous Silica Bilayers

129650-Thumbnail Image.png
Description

Theoretical modeling is presented for a freestanding vitreous silica bilayer which has recently been synthesized and characterized experimentally in landmark work. While such two-dimensional continuous random covalent networks should likely occur on energetic grounds, no synthetic pathway had been discovered

Theoretical modeling is presented for a freestanding vitreous silica bilayer which has recently been synthesized and characterized experimentally in landmark work. While such two-dimensional continuous random covalent networks should likely occur on energetic grounds, no synthetic pathway had been discovered previously. Here the bilayer is modeled using a computer assembly procedure initiated from a single layer of a model of amorphous graphene, generated using a bond-switching algorithm from an initially crystalline graphene structure. Each bond is decorated with an oxygen atom and the carbon atoms are relabeled as silicon, generating a two-dimensional network of corner-sharing triangles. Each triangle is transformed into a tetrahedron, by raising the silicon atom above each triangular base and adding an additional singly coordinated oxygen atom at the apex. The final step in this construction is to mirror-reflect this layer to form a second layer and attach the two layers to form the bilayer. We show that this vitreous silica bilayer has the additional macroscopic degrees of freedom to form easily a network of identical corner-sharing tetrahedra if there is a symmetry plane through the center of the bilayer going through the layer of oxygen ions that join the upper and lower monolayers. This has the consequence that the upper rings lie exactly above the lower rings, which are tilted in general. The assumption of a network of perfect corner-sharing tetrahedra leads to a range of possible densities that we characterize as a flexibility window, with some similarity to flexibility windows in three dimensional zeolites. Finally, using a realistic potential, we have relaxed the bilayer to determine the density and other structural characteristics such as the Si-Si pair distribution functions and the Si-O-Si bond angle distribution, which are compared with experimental results obtained by direct imaging.

Date Created
2013-09-18
Agent

Bond Percolation in Higher Dimensions

129651-Thumbnail Image.png
Description

We collect results for bond percolation on various lattices from two to fourteen dimensions that, in the limit of large dimension d or number of neighbors z, smoothly approach a randomly diluted Erdos-Renyi graph. We include results on bond-diluted hypersphere

We collect results for bond percolation on various lattices from two to fourteen dimensions that, in the limit of large dimension d or number of neighbors z, smoothly approach a randomly diluted Erdos-Renyi graph. We include results on bond-diluted hypersphere packs in up to nine dimensions, which show the mean coordination, excess kurtosis, and skewness evolving smoothly with dimension towards the Erdos-Renyi limit.

Date Created
2013-09-18
Agent

Calculating infrared spectra of proteins and other organic molecules based on normal modes

151169-Thumbnail Image.png
Description
The goal of this theoretical study of infrared spectra was to ascertain to what degree molecules may be identified from their IR spectra and which spectral regions are best suited for this purpose. The frequencies considered range from the lowest

The goal of this theoretical study of infrared spectra was to ascertain to what degree molecules may be identified from their IR spectra and which spectral regions are best suited for this purpose. The frequencies considered range from the lowest frequency molecular vibrations in the far-IR, terahertz region (below ~3 THz or 100 cm-1) up to the highest frequency vibrations (~120 THz or 4000 cm-1). An emphasis was placed on the IR spectra of chemical and biological threat molecules in the interest of detection and prevention. To calculate IR spectra, the technique of normal mode analysis was applied to organic molecules ranging in size from 8 to 11,352 atoms. The IR intensities of the vibrational modes were calculated in terms of the derivative of the molecular dipole moment with respect to each normal coordinate. Three sets of molecules were studied: the organophosphorus G- and V-type nerve agents and chemically related simulants (15 molecules ranging in size from 11 to 40 atoms); 21 other small molecules ranging in size from 8 to 24 atoms; and 13 proteins ranging in size from 304 to 11,352 atoms. Spectra for the first two sets of molecules were calculated using quantum chemistry software, the last two sets using force fields. The "middle" set used both methods, allowing for comparison between them and with experimental spectra from the NIST/EPA Gas-Phase Infrared Library. The calculated spectra of proteins, for which only force field calculations are practical, reproduced the experimentally observed amide I and II bands, but they were shifted by approximately +40 cm-1 relative to experiment. Considering the entire spectrum of protein vibrations, the most promising frequency range for differentiating between proteins was approximately 600-1300 cm-1 where water has low absorption and the proteins show some differences.
Date Created
2012
Agent