Bayesian Approach in Addressing Simultaneous Gene Network Model Selection and Parameter Estimation with Snapshot Data

193430-Thumbnail Image.png
Description
Gene expression models are key to understanding and predicting transcriptional dynamics. This thesis devises a computational method which can efficiently explore a large, highly correlated parameter space, ultimately allowing the author to accurately deduce the underlying gene network model using

Gene expression models are key to understanding and predicting transcriptional dynamics. This thesis devises a computational method which can efficiently explore a large, highly correlated parameter space, ultimately allowing the author to accurately deduce the underlying gene network model using discrete, stochastic mRNA counts derived through the non-invasive imaging method of single molecule fluorescence in situ hybridization (smFISH). An underlying gene network model consists of the number of gene states (distinguished by distinct production rates) and all associated kinetic rate parameters. In this thesis, the author constructs an algorithm based on Bayesian parametric and nonparametric theory, expanding the traditional single gene network inference tools. This expansion starts by increasing the efficiency of classic Markov-Chain Monte Carlo (MCMC) sampling by combining three schemes known in the Bayesian statistical computing community: 1) Adaptive Metropolis-Hastings (AMH), 2) Hamiltonian Monte Carlo (HMC), and 3) Parallel Tempering (PT). The aggregation of these three methods decreases the autocorrelation between sequential MCMC samples, reducing the number of samples required to gain an accurate representation of the posterior probability distribution. Second, by employing Bayesian nonparametric methods, the author is able to simultaneously evaluate discrete and continuous parameters, enabling the method to devise the structure of the gene network and all kinetic parameters, respectively. Due to the nature of Bayesian theory, uncertainty is evaluated for the gene network model in combination with the kinetic parameters. Tools brought from Bayesian nonparametric theory equip the method with an ability to sample from the posterior distribution of all possible gene network models without pre-defining the gene network structure, i.e. the number of gene states. The author verifies the method’s robustness through the use of synthetic snapshot data, designed to closely represent experimental smFISH data sets, across a range of gene network model structures, parameters and experimental settings (number of probed cells and timepoints).
Date Created
2024
Agent

Gas Mixture Dynamics in Pipeline Networks with a Focus on Linearization and Optimal Control

187790-Thumbnail Image.png
Description
Balancing temporal shortages of renewable energy with natural gas for the generation of electricity is a challenge for dispatchers. This is compounded by the recent proposal of blending cleanly-produced hydrogen into natural gas networks. To introduce the

Balancing temporal shortages of renewable energy with natural gas for the generation of electricity is a challenge for dispatchers. This is compounded by the recent proposal of blending cleanly-produced hydrogen into natural gas networks. To introduce the concepts of gas flow, this thesis begins by linearizing the partial differential equations (PDEs) that govern the flow of natural gas in a single pipe. The solution of the linearized PDEs is used to investigate wave attenuation and characterize critical operating regions where linearization is applicable. The nonlinear PDEs for a single gas are extended to mixtures of gases with the addition of a PDE that governs the conservation of composition. The gas mixture formulation is developed for general gas networks that can inject or withdraw arbitrary time-varying mixtures of gases into or from the network at arbitrarily specified nodes, while being influenced by time-varying control actions of compressor units. The PDE formulation is discretized in space to form a nonlinear control system of ordinary differential equations (ODEs), which is used to prove that homogeneous mixtures are well-behaved and heterogeneous mixtures may be ill-behaved in the sense of monotone-ordering of solutions. Numerical simulations are performed to compute interfaces that delimit monotone and periodic system responses. The ODE system is used as the constraints of an optimal control problem (OCP) to minimize the expended energy of compressors. Moreover, the ODE system for the natural gas network is linearized and used as the constraints of a linear OCP. The OCPs are digitally implemented as optimization problems following the discretization of the time domain. The optimization problems are applied to pipelines and small test networks. Some qualitative and computational applications, including linearization error analysis and transient responses, are also investigated.
Date Created
2023
Agent

Network Based Models of Opinion Formation: Consensus and Beyond

162238-Thumbnail Image.png
Description
Understanding the evolution of opinions is a delicate task as the dynamics of how one changes their opinion based on their interactions with others are unclear.
Date Created
2021
Agent

Fatigue and Free Throw Shooting Ability in the NBA

160805-Thumbnail Image.png
Description

We attempt to analyze the effect of fatigue on free throw efficiency in the National Basketball Association (NBA) using play-by-play data from regular-season, regulation-length games in the 2016-2017, 2017-2018, and 2018-2019 seasons. Using both regression and tree-based statistical methods, we

We attempt to analyze the effect of fatigue on free throw efficiency in the National Basketball Association (NBA) using play-by-play data from regular-season, regulation-length games in the 2016-2017, 2017-2018, and 2018-2019 seasons. Using both regression and tree-based statistical methods, we analyze the relationship between minutes played total and minutes played continuously at the time of free throw attempts on players' odds of making an attempt, while controlling for prior free throw shooting ability, longer-term fatigue, and other game factors. Our results offer strong evidence that short-term activity after periods of inactivity positively affects free throw efficiency, while longer-term fatigue has no effect.

Date Created
2021-05
Agent

Learning the Diffusion Coefficient on a Cell Membrane

147666-Thumbnail Image.png
Description

A statistical method is proposed to learn what the diffusion coefficient is at any point in space of a cell membrane. The method used bayesian non-parametrics to learn this value. Learning the diffusion coefficient might be useful for understanding more about cellular dynamics.

Date Created
2021-05
Agent

Modeling collective motion of complex systems using agent-based models & macroscopic models

157690-Thumbnail Image.png
Description
The main objective of mathematical modeling is to connect mathematics with other scientific fields. Developing predictable models help to understand the behavior of biological systems. By testing models, one can relate mathematics and real-world experiments. To validate predictions numerically, one

The main objective of mathematical modeling is to connect mathematics with other scientific fields. Developing predictable models help to understand the behavior of biological systems. By testing models, one can relate mathematics and real-world experiments. To validate predictions numerically, one has to compare them with experimental data sets. Mathematical modeling can be split into two groups: microscopic and macroscopic models. Microscopic models described the motion of so-called agents (e.g. cells, ants) that interact with their surrounding neighbors. The interactions among these agents form at a large scale some special structures such as flocking and swarming. One of the key questions is to relate the particular interactions among agents with the overall emerging structures. Macroscopic models are precisely designed to describe the evolution of such large structures. They are usually given as partial differential equations describing the time evolution of a density distribution (instead of tracking each individual agent). For instance, reaction-diffusion equations are used to model glioma cells and are being used to predict tumor growth. This dissertation aims at developing such a framework to better understand the complex behavior of foraging ants and glioma cells.
Date Created
2019
Agent

Chance-constrained optimization models for agricultural seed development and selection

157571-Thumbnail Image.png
Description
Breeding seeds to include desirable traits (increased yield, drought/temperature resistance, etc.) is a growing and important method of establishing food security. However, besides breeder intuition, few decision-making tools exist that can provide the breeders with credible evidence to make decisions

Breeding seeds to include desirable traits (increased yield, drought/temperature resistance, etc.) is a growing and important method of establishing food security. However, besides breeder intuition, few decision-making tools exist that can provide the breeders with credible evidence to make decisions on which seeds to progress to further stages of development. This thesis attempts to create a chance-constrained knapsack optimization model, which the breeder can use to make better decisions about seed progression and help reduce the levels of risk in their selections. The model’s objective is to select seed varieties out of a larger pool of varieties and maximize the average yield of the “knapsack” based on meeting some risk criteria. Two models are created for different cases. First is the risk reduction model which seeks to reduce the risk of getting a bad yield but still maximize the total yield. The second model considers the possibility of adverse environmental effects and seeks to mitigate the negative effects it could have on the total yield. In practice, breeders can use these models to better quantify uncertainty in selecting seed varieties
Date Created
2019
Agent

NBA Player Clustering: Exploring Player Archetypes in a Changing NBA

132157-Thumbnail Image.png
Description
The findings of this project show that through the use of principal component analysis and K-Means clustering, NBA players can be algorithmically classified in distinct clusters, representing a player archetype. Individual player data for the 2018-2019 regular season was collected

The findings of this project show that through the use of principal component analysis and K-Means clustering, NBA players can be algorithmically classified in distinct clusters, representing a player archetype. Individual player data for the 2018-2019 regular season was collected for 150 players, and this included regular per game statistics, such as rebounds, assists, field goals, etc., and advanced statistics, such as usage percentage, win shares, and value over replacement players. The analysis was achieved using the statistical programming language R on the integrated development environment RStudio. The principal component analysis was computed first in order to produce a set of five principal components, which explain roughly 82.20% of the total variance within the player data. These five principal components were then used as the parameters the players were clustered against in the K-Means clustering algorithm implemented in R. It was determined that eight clusters would best represent the groupings of the players, and eight clusters were created with a unique set of players belonging to each one. Each cluster was analyzed based on the players making up the cluster and a player archetype was established to define each of the clusters. The reasoning behind the player archetypes given to each cluster was explained, providing details as to why the players were clustered together and the main data features that influenced the clustering results. Besides two of the clusters, the archetypes were proven to be independent of the player's position. The clustering results can be expanded on in the future to include a larger sample size of players, and it can be used to make inferences regarding NBA roster construction. The clustering can highlight key weaknesses in rosters and show which combinations of player archetypes lead to team success.
Date Created
2019-05
Agent

Supervised and ensemble classification of multivariate functional data: applications to lupus diagnosis

156580-Thumbnail Image.png
Description
This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain that are critical for population discrimination. The undertaken analyses suggest that derivate-based information contributes significantly in improved classification performance relative to recently published studies on SLE plasma thermograms.
Date Created
2018
Agent

Critical coupling and synchronized clusters in arbitrary networks of Kuramoto oscillators

156420-Thumbnail Image.png
Description
The Kuramoto model is an archetypal model for studying synchronization in groups

of nonidentical oscillators where oscillators are imbued with their own frequency and

coupled with other oscillators though a network of interactions. As the coupling

strength increases, there is a bifurcation to

The Kuramoto model is an archetypal model for studying synchronization in groups

of nonidentical oscillators where oscillators are imbued with their own frequency and

coupled with other oscillators though a network of interactions. As the coupling

strength increases, there is a bifurcation to complete synchronization where all oscillators

move with the same frequency and show a collective rhythm. Kuramoto-like

dynamics are considered a relevant model for instabilities of the AC-power grid which

operates in synchrony under standard conditions but exhibits, in a state of failure,

segmentation of the grid into desynchronized clusters.

In this dissertation the minimum coupling strength required to ensure total frequency

synchronization in a Kuramoto system, called the critical coupling, is investigated.

For coupling strength below the critical coupling, clusters of oscillators form

where oscillators within a cluster are on average oscillating with the same long-term

frequency. A unified order parameter based approach is developed to create approximations

of the critical coupling. Some of the new approximations provide strict lower

bounds for the critical coupling. In addition, these approximations allow for predictions

of the partially synchronized clusters that emerge in the bifurcation from the

synchronized state.

Merging the order parameter approach with graph theoretical concepts leads to a

characterization of this bifurcation as a weighted graph partitioning problem on an

arbitrary networks which then leads to an optimization problem that can efficiently

estimate the partially synchronized clusters. Numerical experiments on random Kuramoto

systems show the high accuracy of these methods. An interpretation of the

methods in the context of power systems is provided.
Date Created
2018
Agent