Wu, Steven | KEEP

Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate 'Tetrahymena Thermophila'

Description

Mutation is the ultimate source of all genetic variation and is, therefore, central to evolutionary change. Previous work on Paramecium tetraurelia found an unusually low germline base-substitution mutation rate in this ciliate. Here, we tested the generality of this result among ciliates using Tetrahymena thermophila. We sequenced the genomes of 10 lines of T. thermophila that had each undergone approximately 1,000 generations of mutation accumulation (MA). We applied an existing mutation-calling pipeline and developed a new probabilistic mutation detection approach that directly models the design of an MA experiment and accommodates the noise introduced by mismapped reads. Our probabilistic mutation-calling method provides a straightforward way of estimating the number of sites at which a mutation could have been called if one was present, providing the denominator for our mutation rate calculations. From these methods, we find that T. thermophila has a germline base-substitution mutation rate of 7.61 × 10 ^-12 per-site, per cell division, which is consistent with the low base-substitution mutation rate in P. tetraurelia. Over the course of the evolution experiment, genomic exclusion lines derived from the MA lines experienced a fitness decline that cannot be accounted for by germline base-substitution mutations alone, suggesting that other genetic or epigenetic factors must be involved. Because selection can only operate to reduce mutation rates based upon the "visible" mutational load, asexual reproduction with a transcriptionally silent germline may allow ciliates to evolve extremely low germline mutation rates.

Date Created

2016-09-15

Agent

Author (aut): Long, Hongan
Author (aut): Winter, David
Author (aut): Chang, Allan Y.-C.
Author (aut): Sung, Way
Author (aut): Wu, Steven
Author (aut): Balboa, Mariel
Author (aut): Azevedo, Ricardo B. R.
Author (aut): Cartwright, Reed
Author (aut): Lynch, Michael
Author (aut): Zufall, Rebecca A.
Contributor (ctb): Biodesign Institute

Estimation of Evolutionary Parameters Using Short, Random, and Partial Sequences From Mixed Samples of Anonymous Individuals

Description

Background: Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabeled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny.

Results: We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95 % Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95 % Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95 % HPDs are larger than those obtained by BEAST.

Conclusion: We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95 % HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.

Date Created

2015-11-04

Agent

Author (aut): Wu, Steven
Author (aut): Rodrigo, Allen G.
Contributor (ctb): Biodesign Institute