Generalized linear models in Bayesian phylogeography

155265-Thumbnail Image.png
Description
Bayesian phylogeography is a framework that has enabled researchers to model the spatiotemporal diffusion of pathogens. In general, the framework assumes that discrete geographic sampling traits follow a continuous-time Markov chain process along the branches of an unknown phylogeny that

Bayesian phylogeography is a framework that has enabled researchers to model the spatiotemporal diffusion of pathogens. In general, the framework assumes that discrete geographic sampling traits follow a continuous-time Markov chain process along the branches of an unknown phylogeny that is informed through nucleotide sequence data. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of predictors of interest to the pathogen. In this dissertation, I focus on these GLMs and describe their capabilities, limitations, and introduce a pipeline that may enable more researchers to utilize this framework.

I first demonstrate how a GLM can be employed and how the support for the predictors can be measured using influenza A/H5N1 in Egypt as an example. Secondly, I compare the GLM framework to two alternative frameworks of Bayesian phylogeography: one that uses an advanced computational technique and one that does not. For this assessment, I model the diffusion of influenza A/H3N2 in the United States during the 2014-15 flu season with five methods encapsulated by the three frameworks. I summarize metrics of the phylogenies created by each and demonstrate their reproducibility by performing analyses on several random sequence samples under a variety of population growth scenarios. Next, I demonstrate how discretization of the location trait for a given sequence set can influence phylogenies and support for predictors. That is, I perform several GLM analyses on a set of sequences and change how the sequences are pooled, then show how aggregating predictors at four levels of spatial resolution will alter posterior support. Finally, I provide a solution for researchers that wish to use the GLM framework but may be deterred by the tedious file-manipulation requirements that must be completed to do so. My pipeline, which is publicly available, should alleviate concerns pertaining to the difficulty and time-consuming nature of creating the files necessary to perform GLM analyses. This dissertation expands the knowledge of Bayesian phylogeographic GLMs and will facilitate the use of this framework, which may ultimately reveal the variables that drive the spread of pathogens.
Date Created
2017
Agent

Plasmodium population structure in the context of malaria control and elimination

152820-Thumbnail Image.png
Description
Malaria is a vector-borne parasitic disease affecting tropical and subtropical regions. Regardless control efforts, malaria incidence is still incredible high with 219 million clinical cases and an estimated 660,000 related deaths (WHO, 2012). In this project, different population genetic approaches

Malaria is a vector-borne parasitic disease affecting tropical and subtropical regions. Regardless control efforts, malaria incidence is still incredible high with 219 million clinical cases and an estimated 660,000 related deaths (WHO, 2012). In this project, different population genetic approaches were explored to characterize parasite populations. The goal was to create a framework that considered temporal and spatial changes of Plasmodium populations in malaria surveillance. This is critical in a vector borne disease in areas of low transmission where there is not accurate information of when and where a patient was infected. In this study, fragment analysis data and single nucleotide polymorphism (SNPs) from South American samples were used to characterize Plasmodium population structure, patterns of migration and gene flow, and discuss approaches to differentiate reinfection vs. recrudescence cases in clinical trials. A Bayesian approach was also applied to analyze the Plasmodium population history by inferring genealogies using microsatellites data. Specifically, fluctuations in the parasite population and the age of different parasite lineages were evaluated through time in order to relate them with the malaria control plan in force. These studies are important to understand the turnover or persistence of "clones" circulating in a specific area through time and consider them in drug efficacy studies. Moreover, this methodology is useful for assessing changes in malaria transmission and for more efficiently manage resources to deploy control measures in locations that act as parasite "sources" for other regions. Overall, these results stress the importance of monitoring malaria demographic changes when assessing the success of elimination programs in areas of low transmission.
Date Created
2014
Agent

The effects of natural selection and random genetic drift in structured populations

150272-Thumbnail Image.png
Description
Building mathematical models and examining the compatibility of their theoretical predictions with empirical data are important for our understanding of evolution. The rapidly increasing amounts of genomic data on polymorphisms greatly motivate evolutionary biologists to find targets of positive selection.

Building mathematical models and examining the compatibility of their theoretical predictions with empirical data are important for our understanding of evolution. The rapidly increasing amounts of genomic data on polymorphisms greatly motivate evolutionary biologists to find targets of positive selection. Although intensive mathematical and statistical studies for characterizing signatures of positive selection have been conducted to identify targets of positive selection, relatively little is known about the effects of other evolutionary forces on signatures of positive selection. In this dissertation, I investigate the effects of various evolutionary factors, including purifying selection and population demography, on signatures of positive selection. Specifically, the effects on two highly used methods for detecting positive selection, one by Wright's Fst and its analogues and the other by footprints of genetic hitchhiking, are investigated. In Chapters 2 and 3, the effect of purifying selection on Fst is studied. The results show that purifying selection intensity greatly affects Fst by modulating allele frequencies across populations. The footprints of genetic hitchhiking in a geographically structured population are studied in Chapter 4. The results demonstrate that footprints of genetic hitchhiking are significantly influenced by geographic structure, which may help scientists to infer the origin and spread of the beneficial allele. In Chapter 5, the stochastic dynamics of a hitchhiking allele are studied using the diffusion process of genetic hitchhiking conditioned on the fixation of the beneficial allele. Explicit formulae for the conditioned two-locus diffusion process of genetic hitchhiking are derived and stochastic aspects of genetic hitchhiking are investigated. The results in this dissertation show that it is essential to model the interaction of neutral and selective forces for correct identification of the targets of positive selection.
Date Created
2011
Agent