Bayesian Approach in Addressing Simultaneous Gene Network Model Selection and Parameter Estimation with Snapshot Data
Document
Description
Gene expression models are key to understanding and predicting transcriptional dynamics. This thesis devises a computational method which can efficiently explore a large, highly correlated parameter space, ultimately allowing the author to accurately deduce the underlying gene network model using discrete, stochastic mRNA counts derived through the non-invasive imaging method of single molecule fluorescence in situ hybridization (smFISH). An underlying gene network model consists of the number of gene states (distinguished by distinct production rates) and all associated kinetic rate parameters. In this thesis, the author constructs an algorithm based on Bayesian parametric and nonparametric theory, expanding the traditional single gene network inference tools. This expansion starts by increasing the efficiency of classic Markov-Chain Monte Carlo (MCMC) sampling by combining three schemes known in the Bayesian statistical computing community: 1) Adaptive Metropolis-Hastings (AMH), 2) Hamiltonian Monte Carlo (HMC), and 3) Parallel Tempering (PT). The aggregation of these three methods decreases the autocorrelation between sequential MCMC samples, reducing the number of samples required to gain an accurate representation of the posterior probability distribution. Second, by employing Bayesian nonparametric methods, the author is able to simultaneously evaluate discrete and continuous parameters, enabling the method to devise the structure of the gene network and all kinetic parameters, respectively. Due to the nature of Bayesian theory, uncertainty is evaluated for the gene network model in combination with the kinetic parameters. Tools brought from Bayesian nonparametric theory equip the method with an ability to sample from the posterior distribution of all possible gene network models without pre-defining the gene network structure, i.e. the number of gene states. The author verifies the method’s robustness through the use of synthetic snapshot data, designed to closely represent experimental smFISH data sets, across a range of gene network model structures, parameters and experimental settings (number of probed cells and timepoints).