Revealing microbial responses to environmental dynamics: developing methods for analysis and visualization of complex sequence datasets

Description
The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First,

The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First, I examine an iron limitation transcriptome of Synechocystis sp. PCC 6803 using a new methodology. Until now, iron limitation in experiments of Synechocystis sp. PCC 6803 gene expression has been achieved through media chelation. Notably, chelation also reduces the bioavailability of other metals, whereas naturally occurring low iron settings likely result from a lack of iron influx and not as a result of chelation. The overall metabolic trends of previous studies are well-characterized but within those trends is significant variability in single gene expression responses. I compare previous transcriptomics analyses with our protocol that limits the addition of bioavailable iron to growth media to identify consistent gene expression signals resulting from iron limitation. Second, I describe a novel method of improving the reliability of centroid-linkage clustering results. The size and complexity of modern sequencing datasets often prohibit constructing distance matrices, which prevents the use of many common clustering algorithms. Centroid-linkage circumvents the need for a distance matrix, but has the adverse effect of producing input-order dependent results. In this chapter, I describe a method of cluster edge counting across iterated centroid-linkage results and reconstructing aggregate clusters from a ranked edge list without a distance matrix and input-order dependence. Finally, I introduce dendritic heat maps, a new figure type that visualizes heat map responses through expanding and contracting sequence clustering specificities. Heat maps are useful for comparing data across a range of possible states. However, data binning is sensitive to clustering cutoffs which are often arbitrarily introduced by researchers and can substantially change the heat map response of any single data point. With an understanding of how the architectural elements of dendrograms and heat maps affect data visualization, I have integrated their salient features to create a figure type aimed at viewing multiple levels of clustering cutoffs, allowing researchers to better understand the effects of environment on metabolism or phylogenetic lineages.
Date Created
2017
Agent

Draft Genome Sequence of Microvirga sp. Strain BSC39, Isolated From Biological Soil Crust of Moab, Utah

128449-Thumbnail Image.png
Description

Microvirga sp. BSC39 was isolated from a biological soil crust near Moab, Utah. The strain appears to be capable of chemotaxis and exopolysaccharide synthesis for biofilm adhesion. The BSC39 genome contains iron siderophore uptake and hydrolysis enzymes; however, it lacks

Microvirga sp. BSC39 was isolated from a biological soil crust near Moab, Utah. The strain appears to be capable of chemotaxis and exopolysaccharide synthesis for biofilm adhesion. The BSC39 genome contains iron siderophore uptake and hydrolysis enzymes; however, it lacks siderophore synthesis pathways, suggesting the uptake of siderophores produced by neighboring microbes.

Date Created
2014-11-13
Agent

Draft Genome Sequence of Massilia sp. Strain BSC265, Isolated From Biological Soil Crust of Moab, Utah

128450-Thumbnail Image.png
Description

Massilia sp. BSC265 was isolated from a biological soil crust near Moab, Utah. The strain appears to be capable of chemotaxis and exopolysaccharide synthesis for biofilm adhesion. The BSC265 genome contains a complete dissimilatory nitrate reduction pathway as well as a TCA cycle, making it a facultative anaerobe.

Date Created
2014-11-13
Agent

Draft Genome Sequence of Bacillus sp. Strain BSC154, Isolated From Biological Soil Crust of Moab, Utah

128451-Thumbnail Image.png
Description

Bacillus sp. BSC154 was isolated from a biological soil crust near Moab, Utah. The strain appears to be capable of chemotaxis and biofilm production. The BSC154 genome contains iron siderophore production, nitrate reduction, mixed acid-butanediol fermentation, and assimilatory and dissimilatory sulfate metabolism pathways.

Date Created
2014-11-13
Agent

Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence With Phenotype Divergence

128743-Thumbnail Image.png
Description

The advancement of techniques to visualize and analyze large-scale sequencing datasets is an area of active research and is rooted in traditional techniques such as heat maps and dendrograms. We introduce dendritic heat maps that display heat map results over

The advancement of techniques to visualize and analyze large-scale sequencing datasets is an area of active research and is rooted in traditional techniques such as heat maps and dendrograms. We introduce dendritic heat maps that display heat map results over aligned DNA sequence clusters for a range of clustering cutoffs. Dendritic heat maps aid in visualizing the effects of group differences on clustering hierarchy and relative abundance of sampled sequences. Here, we artificially generate two separate datasets with simplified mutation and population growth procedures with GC content group separation to use as example phenotypes. In this work, we use the term phenotype to represent any feature by which groups can be separated. These sequences were clustered in a fractional identity range of 0.75 to 1.0 using agglomerative minimum-, maximum-, and average-linkage algorithms, as well as a divisive centroid-based algorithm. We demonstrate that dendritic heat maps give freedom to scrutinize specific clustering levels across a range of cutoffs, track changes in phenotype inequity across multiple levels of sequence clustering specificity, and easily visualize how deeply rooted changes in phenotype inequity are in a dataset. As genotypes diverge in sample populations, clusters are shown to break apart into smaller clusters at higher identity cutoff levels, similar to a dendrogram. Phenotype divergence, which is shown as a heat map of relative abundance bin response, may or may not follow genotype divergences. This joined view highlights the relationship between genotype and phenotype divergence for treatment groups. We discuss the minimum-, maximum-, average-, and centroid-linkage algorithm approaches to building dendritic heat maps and make a case for the divisive “top-down” centroid-based clustering methodology as being the best option visualize the effects of changing factors on clustering hierarchy and relative abundance.

Date Created
2016-08-18
Agent

Resolving Prokaryotic Taxonomy Without rRNA: Longer Oligonucleotide Word Lengths Improve Genome and Metagenome Taxonomic Classification

128781-Thumbnail Image.png
Description

Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these

Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.

Date Created
2013-07-01
Agent

Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

128916-Thumbnail Image.png
Description

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters.

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability.

Date Created
2012-06-04
Agent

Ethnogeology at the core of basic and applied research: surface water systems and mode of action of a natural antibacterial clay of the Colombian Amazon

154919-Thumbnail Image.png
Description
Amazonia, inhabited and investigated for millennia, continues to astonish scientists with its cultural and natural diversity. Although Amazonia is rapidly changing, its vast and varied landscape still contains a complex natural pharmacopeia. The Amazonian tribes have accrued valuable environmental and

Amazonia, inhabited and investigated for millennia, continues to astonish scientists with its cultural and natural diversity. Although Amazonia is rapidly changing, its vast and varied landscape still contains a complex natural pharmacopeia. The Amazonian tribes have accrued valuable environmental and geological knowledge that can be studied. This dissertation demonstrates that Indigenous Knowledge considered alongside Western Science can enhance our understanding of the relationship of people to geological materials and hydrological resources, and reveal mineral medicines with practical applications.

I used methods from anthropology and geology to explore the geological knowledge of the Uitoto, a tribe of the Colombian Amazon. The Uitoto use two metaphors to describe Earth systems: 1. the earth is a body, and 2. the Amazon is a tree. I found that they classify surface-water systems according to observable characteristics and use mineral clays to treat various maladies. I argue that Uitoto knowledge about Amazonian mineral resources and surface water is practical, empirically–based and, in many cases, more nuanced than mainstream scientific knowledge.

I studied the mode of action of a natural antibacterial clay from the Colombian Amazon (AMZ) to discover whether the Uitoto’s claims about the clay’s medicinal values was verifiable using the methods of Western Science. Natural antibacterial clays can inhibit the growth of human pathogens. Methods from microbiology and geochemistry were combined to evaluate the mineral-microbe interactions that inhibit growth of model Gram-negative (Escherichia coli) and Gram-positive (Bacillus subtilis) bacteria. The AMZ antibacterial clay contains 45 % kaolinites and 30 % smectites. Its high surface area maintains an acidic environment (pH 4.5) and releases high concentrations of aluminum. Aluminum accumulates in the outer membrane of E. coli by binding to phospholipids. Furthermore, the membrane’s permeability increases due to synergistic effects between aluminum and transition metals released from the AMZ (i.e. Fe, Cu). The changes in the membrane may compromise its function as a barrier. Understanding the antibacterial mechanism of AMZ is key for its safe use as a natural product. These findings can help us harness the capabilities of antibacterial clays more efficiently.

Lastly, I integrated the results of this work in place-based, cross-cultural educational materials tailored for the tribal schools in the Colombian Amazon. The design of the units was informed by principles of curriculum design and successful pedagogic approaches for Native American students. The purpose of these educational materials is to return the results of research, enhance learning and participation of indigenous peoples in geosciences, and respond to the multicultural and plurilingual educational needs in countries such as Colombia.
Date Created
2016
Agent

Integrating metagenomics and geochemistry: functional evolution and taxonomic classification of hot spring communities

152999-Thumbnail Image.png
Description
The taxonomic and metabolic profile of the microbial community inhabiting a natural system is largely determined by the physical and geochemical properties of the system. However, the influences of parameters beyond temperature, pH and salinity have been poorly analyzed with

The taxonomic and metabolic profile of the microbial community inhabiting a natural system is largely determined by the physical and geochemical properties of the system. However, the influences of parameters beyond temperature, pH and salinity have been poorly analyzed with few studies incorporating the comprehensive suite of physical and geochemical measurements required to fully investigate the complex interactions known to exist between biology and the environment. Further, the techniques used to classify the taxonomic and functional composition of a microbial community are fragmented and unwieldy, resulting in unnecessarily complex and often non-consilient results.

This dissertation integrates environmental metagenomes with extensive geochemical metadata for the development and application of multidimensional biogeochemical metrics. Analysis techniques including a Markov cluster-based evolutionary distance between whole communities, oligonucleotide signature-based taxonomic binning and principal component analysis of geochemical parameters allow for the determination of correlations between microbial community dynamics and environmental parameters. Together, these techniques allow for the taxonomic classification and functional analysis of the evolution of hot spring communities. Further, these techniques provide insight into specific geochemistry-biology interactions which enable targeted analyses of community taxonomic and functional diversity. Finally, analysis of synonymous substitution rates among physically separated microbial communities provides insights into microbial dispersion patterns and the roles of environmental geochemistry and community metabolism on DNA transfer among hot spring communities.

The data presented here confirms temperature and pH as the primary factors shaping the evolutionary trajectories of microbial communities. However, the integration of extensive geochemical metadata reveals new links between geochemical parameters and the distribution and functional diversification of communities. Further, an overall geochemical gradient (from multivariate analyses) between natural systems provides one of the most complete predictions of microbial community functional composition and inter-community DNA transfer rates. Finally, the taxonomic classification and clustering techniques developed within this dissertation will facilitate future genomic and metagenomic studies through enhanced community profiling obtainable via Markov clustering, longer oligonucleotide signatures and insight into PCR primer biases.
Date Created
2014
Agent

Merging Metagenomics and Geochemistry Reveals Environmental Controls on Biological Diversity and Evolution

Description

Background: The metabolic strategies employed by microbes inhabiting natural systems are, in large part, dictated by the physical and geochemical properties of the environment. This study sheds light onto the complex relationship between biology and environmental geochemistry using forty-three metagenomes collected

Background: The metabolic strategies employed by microbes inhabiting natural systems are, in large part, dictated by the physical and geochemical properties of the environment. This study sheds light onto the complex relationship between biology and environmental geochemistry using forty-three metagenomes collected from geochemically diverse and globally distributed natural systems. It is widely hypothesized that many uncommonly measured geochemical parameters affect community dynamics and this study leverages the development and application of multidimensional biogeochemical metrics to study correlations between geochemistry and microbial ecology. Analysis techniques such as a Markov cluster-based measure of the evolutionary distance between whole communities and a principal component analysis (PCA) of the geochemical gradients between environments allows for the determination of correlations between microbial community dynamics and environmental geochemistry and provides insight into which geochemical parameters most strongly influence microbial biodiversity.

Results: By progressively building from samples taken along well defined geochemical gradients to samples widely dispersed in geochemical space this study reveals strong links between the extent of taxonomic and functional diversification of resident communities and environmental geochemistry and reveals temperature and pH as the primary factors that have shaped the evolution of these communities. Moreover, the inclusion of extensive geochemical data into analyses reveals new links between geochemical parameters (e.g. oxygen and trace element availability) and the distribution and taxonomic diversification of communities at the functional level. Further, an overall geochemical gradient (from multivariate analyses) between natural systems provides one of the most complete predictions of microbial taxonomic and functional composition.

Conclusions: Clustering based on the frequency in which orthologous proteins occur among metagenomes facilitated accurate prediction of the ordering of community functional composition along geochemical gradients, despite a lack of geochemical input. The consistency in the results obtained from the application of Markov clustering and multivariate methods to distinct natural systems underscore their utility in predicting the functional potential of microbial communities within a natural system based on system geochemistry alone, allowing geochemical measurements to be used to predict purely biological metrics such as microbial community composition and metabolism.

Date Created
2014-05-28
Agent