Solvation Thermodynamics and Free Energy Surfaces of Intrinsically Disordered Proteins (IDPs) in Aqueous Solutions

193412-Thumbnail Image.png
Description
Contrary to the traditional structure-function paradigm for proteins, intrinsically disorderedproteins (IDPs) and regions (IDRs) are highly disordered sequences that lack a fixed crystal structure yet perform various biological activities such as cell signaling, regulation, and recognition. The interactions of these disordered regions

Contrary to the traditional structure-function paradigm for proteins, intrinsically disorderedproteins (IDPs) and regions (IDRs) are highly disordered sequences that lack a fixed crystal structure yet perform various biological activities such as cell signaling, regulation, and recognition. The interactions of these disordered regions with water molecules are essential in the conformational distribution. Hence, exploring their solvation thermodynamics is crucial for understanding their functions, which are challenging to study experimentally. In this thesis, classical Molecular Dynamics (MD), 3D-Two Phase Thermodynamics (3D- 2PT), and umbrella sampling have been employed to gain insights into the behaviors of intrinsically disordered proteins (IDPs) and water. In the first project, local and total solvation thermodynamics around the K-18 domain of the intrinsically disordered protein Tau were compared, and simulated with four pairs of modified and standard force fields. In empirical force fields, an imbalance between intramolecular protein interactions and protein-water interactions often leads to collapsed IDP structures in simulations. To counter this, various methods have been devised to refine protein-water interaction models. This research applied both standard and adapted force fields in simulations, scrutinizing the effects of each adjustment on solvation free energy. In the second project, the MD-based 3D-2PT analysis was utilized to examine variations in local entropy and number density of bulk water in response to an electric field, focusing on the vicinity of reference water molecules. In the third project, various peptide sequences were examined to quantify the free energy involved when specific sequences, known as alpha-MoRFs (alpha-Molecular Recognition Features), transition from intrinsically disordered states to structured secondary motifs like the alpha-helix. The low folding free energy penalty of these sequences can be exploited to design peptide-based or small-molecule drugs. Upon binding to alpha-MoRFs, these drugs can stabilize the helix structure through a binding-induced folding mechanism. Alpha-MoRFs were juxtaposed with entirely disordered sequences from known proteins, with findings benchmarked against leading structure prediction models. Additionally, the binding free energies of various alpha-MoRFs in their folded conformation were assessed to discern if experimental binding free energies reflect the separate contributions of folding and binding, as obtained from umbrella sampling simulations.
Date Created
2024
Agent

Adaptive Gray Box Reinforcement Learning Methods to Support Therapeutic Research: From Product design to Manufacturing

190990-Thumbnail Image.png
Description
This thesis is developed in the context of biomanufacturing of modern products that have the following properties: require short design to manufacturing time, they have high variability due to a high desired level of patient personalization, and, as a result,

This thesis is developed in the context of biomanufacturing of modern products that have the following properties: require short design to manufacturing time, they have high variability due to a high desired level of patient personalization, and, as a result, may be manufactured in low volumes. This area at the intersection of therapeutics and biomanufacturing has become increasingly important: (i) a huge push toward the design of new RNA nanoparticles has revolutionized the science of vaccines due to the COVID-19 pandemic; (ii) while the technology to produce personalized cancer medications is available, efficient design and operation of manufacturing systems is not yet agreed upon. In this work, the focus is on operations research methodologies that can support faster design of novel products, specifically RNA; and methods for the enabling of personalization in biomanufacturing, and will specifically look at the problem of cancer therapy manufacturing. Across both areas, methods are presented attempting to embed pre-existing knowledge (e.g., constraints characterizing good molecules, comparison between structures) as well as learn problem structure (e.g., the landscape of the rewards function while synthesizing the control for a single use bioreactor). This thesis produced three key outcomes: (i) ExpertRNA for the prediction of the structure of an RNA molecule given a sequence. RNA structure is fundamental in determining its function. Therefore, having efficient tools for such prediction can make all the difference for a scientist trying to understand optimal molecule configuration. For the first time, the algorithm allows expert evaluation in the loop to judge the partial predictions that the tool produces; (ii) BioMAN, a discrete event simulation tool for the study of single-use biomanufacturing of personalized cancer therapies. The discrete event simulation engine was designed tailored to handle the efficient scheduling of many parallel events which is cause by the presence of single use resources. This is the first simulator of this type for individual therapies; (iii) Part-MCTS, a novel sequential decision-making algorithm to support the control of single use systems. This tool integrates for the first-time simulation, monte-carlo tree search and optimal computing budget allocation for managing the computational effort.
Date Created
2023
Agent

Programming Nucleic Acid Systems through Computation Design: from Dynamic Reaction to Complex Self Assembly

187308-Thumbnail Image.png
Description
As a rapidly evolving field, nucleic acid nanotechnology focuses on creating functional nanostructures or dynamic devices through harnessing the programmbility of nucleic acids including deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), enabled by the predictable Watson-Crick base pairing. The precise

As a rapidly evolving field, nucleic acid nanotechnology focuses on creating functional nanostructures or dynamic devices through harnessing the programmbility of nucleic acids including deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), enabled by the predictable Watson-Crick base pairing. The precise control over the sequence and structure, along with the development of simulation softwares for the prediction of the experimental implementation provides the base of designing structures or devices with arbitrary topology and operational logic at nanoscale. Over the past 40 years, the thriving field has pushed the boundaries of nucleic acids, from originally biological macromolecules to functional building blocks with applications in biomedicine, molecular diagnostics and imaging, material science, electronics, crystallography, and more have emerged through programming the sequences and generating the various structures or devices. The underlying logic of nucleic acid programming is the base pairing rule, straightforward and robust. While for the complicated design of sequences and quantitative understanding of the programmed results, computational tools will markedly reduced the level of difficulty and even meet the challenge not available with manual effort. With this thesis three individual projects are presented, with all of them interweaving theory/computation and experiments. In a higher level abstraction, this dissertation covers the topic of biophysical understanding of the dynamic reactions, designing and realizing complex self-assembly systems and finally super-resolutional imaging. More specifically, Chapter 2 describes the study of RNA strand displacement kinetics with dedicated model extracting the reaction rates, providing guidelines for the rational design and regulation of the strand displacement reactions and eventually biochemical processes. In chapter 3 the platform for the design of complex symmetry of the self-assembly target and first experimental implementation of the assembly of pyrochlore lattices with DNA origamis are presented, which potentially can be applied to manipulate lights as optical materials. Chapter 4 focuses on the in solution characterization of the periodicity of DNA origami lattices with super-resolutional microscopy, with algorithms in development for three dimensional structural reconstruction.
Date Created
2023
Agent

2D and 3D DNA Origami Encryption Optimization Utilizing DNA-PAINT

Description

A form of nanoscale steganography exists described as DNA origami cryptography which is a technique of secure information encryption through scaffold, staple, and varying docking strand self- assembling mixtures. The all-DNA steganography based origami was imaged through high-speed DNA-PAINT super-resolution

A form of nanoscale steganography exists described as DNA origami cryptography which is a technique of secure information encryption through scaffold, staple, and varying docking strand self- assembling mixtures. The all-DNA steganography based origami was imaged through high-speed DNA-PAINT super-resolution imaging which uses periodic docking sequences to eliminate the need for protein binding. The purpose of this research was to improve upon the DNA origami cryptography protocol by encrypting information in 2D Rothemund Rectangular DNA Origami (RRO) and 3D cuboctahedron DNA origami as a platform of self-assembling DNA nanostructures to increase the routing possibilities of the scaffold. The initial focus of the work was increasing the incorporation efficiency of all individual docking spots for full 20nm grid RRO pattern readout. Due to this procedural optimization was pursued by altering annealing cycle length, centrifugal spin rates for purification, and lengthening docking strands vs. imager poly T linkers. A 14nm grid was explored as an intermediate prior to the 10nm grid for comparison of optimized experimental procedure for a higher density encryption pattern option. Imager concentration was discovered to be a vital determining factor in effectively resolving the 10nm grids due to high concentrations of imager strands inducing simultaneous blinking of adjacent docking strands to be more likely causing the 10nm grids to not be resolved. A 2 redundancy and 3 redundancy encryption scheme was developed for the 10nm grid RRO to be encrypted with. Further experimentation was completed to resolve full 10nm DNA-origami grids and encrypt with the message ”ASU”. The message was successfully encrypted and resolved through the high density 10nm grid with 2 and 3 redundancy patterns. A cuboctahedron 3D origami was explored with DNA-PAINT techniques as well resulting in successful resolution of the z-axis through variation of biotin linker length and calibration file. Positive results for short message ”0407” encryption of the cuboctahedron were achieved. Data encryption in DNA origami is further being explored and could be an optimal solution for higher density data storage with greater longevity of media.

Date Created
2023-05
Agent

Integrative Computational Immunology: From Molecules to Mortality

171888-Thumbnail Image.png
Description
Computational models have long been used to describe and predict the outcome of complex immunological processes. The dissertation work described here centers on the construction of multiscale computational immunology models that derives biological insights at the population, systems, and atomistic

Computational models have long been used to describe and predict the outcome of complex immunological processes. The dissertation work described here centers on the construction of multiscale computational immunology models that derives biological insights at the population, systems, and atomistic levels. First, SARS-CoV-2 mortality is investigated through the lens of the predicted robustness of CD8+ T cell responses in 23 different populations. The robustness of CD8+ T cell responses in a given population was modeled by predicting the efficiency of endemic MHC-I protein variants to present peptides derived from SARS-CoV-2 proteins to circulating T cells. To accomplish this task, an algorithm, called EnsembleMHC, was developed to predict viral peptides with a high probability of being recognized by CD T cells. It was discovered that there was significant variation in the efficiency of different MHC-I protein variants to present SARS-CoV-2 derived peptides, and countries enriched with variants with high presentation efficiency had significantly lower mortality rates. Second, a biophysics-based MHC-I peptide prediction algorithm was developed. The MHC-I protein is the most polymorphic protein in the human genome with polymorphisms in the peptide binding causing striking changes in the amino acid compositions, or binding motifs, of peptide species capable of stable binding. A deep learning model, coined HLA-Inception, was trained to predict peptide binding using only biophysical properties, namely electrostatic potential. HLA-Inception was shown to be extremely accurate and efficient at predicting peptide binding motifs and was used to determine the peptide binding motifs of 5,821 MHC-I protein variants. Finally, the impact of stalk glycosylations on NL63 protein dynamics was investigated. Previous data has shown that coronavirus crown glycans play an important role in immune evasion and receptor binding, however, little is known about the role of the stalk glycans. Through the integration of computational biology, experimental data, and physics-based simulations, the stalk glycans were shown to heavily influence the bending angle of spike protein, with a particular emphasis on the glycan at position 1242. Further investigation revealed that removal of the N1242 glycan significantly reduced infectivity, highlighting a new potential therapeutic target. Overall, these investigations and associated innovations in integrative modeling.
Date Created
2022
Agent

Disentangling the Spatial Resolution of Changes in Solvation Free Energy Using Explicit Solvent Molecular Dynamics Simulations

171872-Thumbnail Image.png
Description
Understanding solvent-mediated interactions in biomolecular systems at the molecular level is important for the development of predictive models for processes such as protein folding and ligand binding to a host biomolecule. Solvent-mediated interactions can be quantified as changes in the

Understanding solvent-mediated interactions in biomolecular systems at the molecular level is important for the development of predictive models for processes such as protein folding and ligand binding to a host biomolecule. Solvent-mediated interactions can be quantified as changes in the solvation free energy of solvated molecules. Theoretical models of solvent-mediated interactions thus need to include ensemble-averaged solute-solvent interactions. In this thesis, molecular dynamics simulations were coupled with the 3D-2PT method to decompose solvation free energies into spatially resolved local contributions. In the first project, this approach was applied to benzene derivatives to guide the development of efficient and predictive models of solvent-mediated interactions in the context of computational drug design. Specifically, the effects of carboxyl and nitro groups on solvation were studied due to their similar sterical requirements but distinct interactions with water. A system of solvation free energy arithmetics was developed and showed that non-additive contributions to the solvation free energy originate in electrostatic solute-solvent interactions, which are qualitatively reproduced by computationally efficient continuum models. In the second project, a simple model system was used to analyze hydrophilic water-mediated interactions (water-mediated hydrogen bonds), which have been previously suggested to play a key role in protein folding. Using the spatially resolved analysis of solvation free energies, the sites of bridging water molecules were identified as the primary origin of solvent-mediated forces and showed that changes in hydration shell structure can be neglected. In the third project, the analysis of solvation free energy contributions is applied to proteins in inhomogeneous electric fields to explore water-mediated contributions to protein dielectrophoresis. The results provide a potential explanation for negative dielectrophoretic forces on proteins, which have been observed experimentally but cannot be explained with previous theoretical models.
Date Created
2022
Agent

Software Tools for Design, Simulation, and Characterization of DNA and RNA Nanostructures

Description
Nucleic acid nanotechnology is a field of nanoscale engineering where the sequences of deoxyribonucleicacid (DNA) and ribonucleic acid (RNA) molecules are carefully designed to create self–assembled nanostructures with higher spatial resolution than is available to top–down fabrication methods. In the

Nucleic acid nanotechnology is a field of nanoscale engineering where the sequences of deoxyribonucleicacid (DNA) and ribonucleic acid (RNA) molecules are carefully designed to create self–assembled nanostructures with higher spatial resolution than is available to top–down fabrication methods. In the 40 year history of the field, the structures created have scaled from small tile–like structures constructed from a few hundred individual nucleotides to micron–scale structures assembled from millions of nucleotides using the technique of “DNA origami”. One of the key drivers of advancement in any modern engineering field is the parallel development of software which facilitates the design of components and performs in silico simulation of the target structure to determine its structural properties, dynamic behavior, and identify defects. For nucleic acid nanotechnology, the design software CaDNAno and simulation software oxDNA are the most popular choices for design and simulation, respectively. In this dissertation I will present my work on the oxDNA software ecosystem, including an analysis toolkit, a web–based graphical interface, and a new molecular visualization tool which doubles as a free–form design editor that covers some of the weaknesses of CaDNAno’s lattice–based design paradigm. Finally, as a demonstration of the utility of these new tools I show oxDNA simulation and subsequent analysis of a nanoscale leaf–spring engine capable of converting chemical energy into dynamic motion. OxDNA simulations were used to investigate the effects of design choices on the behavior of the system and rationalize experimental results.
Date Created
2022
Agent

Computational Analysis & Design of Biopolymers

171418-Thumbnail Image.png
Description
Biopolymers perform the majority of essential functions necessary for life. From a small amount of components emerges considerable complexity in both structure and function. The separated timescales of dynamic processes and intricate intra- and inter-molecular interactions of these molecules necessitate

Biopolymers perform the majority of essential functions necessary for life. From a small amount of components emerges considerable complexity in both structure and function. The separated timescales of dynamic processes and intricate intra- and inter-molecular interactions of these molecules necessitate the development and utilization of computational approaches for biopolymer study and nanotechnology applications. Biopolymer nanotechnology exploits the natural chemistry of biopolymers to perform novel functions at the nanoscale. Molecular dynamics is the numerical simulation of chemical entities according to the physical laws of motion and statistical mechanics. The number of atoms in biopolymers require coarse-grained methods to fully sample the dynamics of the system with reasonable resources. Accordingly, a coarse-grained molecular dynamics model for the characterization of hybrid nucleic acid-protein nanotechnology was developed. Proteins are represented as an anisotropic network model (ANM) which show good agreement with experimentally derived protein dynamics for a small computational cost. The model was subsequently applied to hybrid DNA-protein cages systems and exhibited excellent agreement with experimental results. Ongoing development efforts look to apply network models to oxDNA origami to create multiscale models for DNA origami. The network approximation will allow for detailed simulation of DNA origami association, of concern to DNA crystal and lattice formation. Identification and design of target-specific binders (aptamers) has received considerable attention on account of their diagnostic and therapeutic potential. Generated in selection cycles from extensive random libraries, biopolymer aptamers are of particular interest due to their potential non-immunogenic properties. Machine learning leverages the use of powerful statistical principles to train a model to transform an input into a desired output. Parameters of the model are iteratively adjusted according to the gradient of the cost function. An unsupervised and generative machine learning model was applied to Thrombin aptamer sequence data. From the model, sequence characteristics necessary for binding were identified and new aptamers capable of binding Thrombin were sampled and verified experimentally. Future work on the development and utilization of an unsupervised and interpretable machine learning model for unaligned sequence data is also discussed.
Date Created
2022
Agent

Comparative Analysis of Molecular Simulations

Description
The purpose of this project was to compare the different physical models behind four algorithms in computational chemistry: Molecular dynamics with a thermostat (specifically simple velocity rescaling, Berendsen, and Nosé-Hoover), Langevin dynamics, Brownian dynamics, and Monte Carlo. These algorithms were

The purpose of this project was to compare the different physical models behind four algorithms in computational chemistry: Molecular dynamics with a thermostat (specifically simple velocity rescaling, Berendsen, and Nosé-Hoover), Langevin dynamics, Brownian dynamics, and Monte Carlo. These algorithms were programmed in C and the impact of specific parameters, such as the coupling parameter and time step, were studied. Their results were compared based on their radial distribution functions and, when the thermostats were in use, fluctuations in temperature.
Date Created
2022-12
Agent

Analyzing the Effects of Conformational Fluctuations on Protein-Water Interactions in Barnase-Barstar Using All Atom Molecular Dynamics Simulations

165598-Thumbnail Image.png
Description
Barnase-Barstar is a protein complex that has a strong association constant. The purpose of this research is to investigate the effects of conformational fluctuations on protein-water interactions, resulting water-mediated interactions, and the binding free energy of the protein complex. Using

Barnase-Barstar is a protein complex that has a strong association constant. The purpose of this research is to investigate the effects of conformational fluctuations on protein-water interactions, resulting water-mediated interactions, and the binding free energy of the protein complex. Using all-atom molecular dynamics simulations, the sets of simulations for flexible and rigid proteins to identify the effects on water-mediated interactions were prepared for analysis. To analyze the properties and interactions that result in the strong association of the Barnase-Barstar protein complex, the molecular dynamics simulations were prepared. A thorough review of the GROMACS manual and completion of the GROMACS Lysozyme in Water tutorial was completed to understand the steps and commands to write and run molecular dynamics simulations. The preliminary data investigated the impact of water-mediated interactions on the solvation free energy in the Barnase-Barstar protein complex where the proteins are kept rigid. This was achieved by observing the change in solvation free energy with respect to separation distance. From the data obtained, it is concluded that solvent-mediated interactions do not contribute to the negative binding free energy. With increasing separation distance, the change in solvation free energy decreased. Therefore, thermodynamically, water-mediated interactions destabilize the protein complex, while the binding free energy is dominated by direct protein-protein interactions. The follow-up simulations of flexible proteins with controlled protein-protein separation distances, for which a fully automated simulation and analysis protocol has been prepared in this project, will allow us to quantify the impact of conformational fluctuations on water-mediated interactions and the binding free energy of the protein complex by comparison to the simulations of rigid proteins.
Date Created
2022-05
Agent