Circular RNA characterization and regulatory network prediction in human tissue

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.
Date Created

Functional and proteome differences in skeletal muscle mitochondria between lean and obese humans

Skeletal muscle (SM) mitochondria generate the majority of adenosine triphosphate (ATP) in SM, and help regulate whole-body energy expenditure. Obesity is associated with alterations in SM mitochondria, which are unique with respect to their arrangement within cells; some mitochondria are

Skeletal muscle (SM) mitochondria generate the majority of adenosine triphosphate (ATP) in SM, and help regulate whole-body energy expenditure. Obesity is associated with alterations in SM mitochondria, which are unique with respect to their arrangement within cells; some mitochondria are located directly beneath the sarcolemma (i.e., subsarcolemmal (SS) mitochondria), while other are nested between the myofibrils (i.e., intermyofibrillar (IMF) mitochondria). Functional and proteome differences specific to SS versus IMF mitochondria in obese individuals may contribute to reduced capacity for muscle ATP production seen in obesity. The overall goals of this work were to (1) isolate functional muscle SS and IMF mitochondria from lean and obese individuals, (2) assess enzyme activities associated with the electron transport chain and ATP production, (3) determine if elevated plasma amino acids enhance SS and IMF mitochondrial respiration and ATP production rates in SM of obese humans, and (4) determine differences in mitochondrial proteome regulating energy metabolism and key biological processes associated with SS and IMF mitochondria between lean and obese humans.

Polarography was used to determine functional differences in isolated SS and IMF mitochondria between lean (37 ± 3 yrs; n = 10) and obese (35 ± 3 yrs; n = 11) subjects during either saline (control) or amino acid (AA) infusions. AA infusion increased ADP-stimulated respiration (i.e., coupled respiration), non-ADP stimulated respiration (i.e., uncoupled respiration), and ATP production rates in SS, but not IMF mitochondria in lean (n = 10; P < 0.05). Neither infusion increased any of the above parameters in muscle SS or IMF mitochondria of the obese subjects.

Using label free quantitative mass spectrometry, we determined differences in proteomes of SM SS and IMF mitochondria between lean (33 ± 3 yrs; n = 16) and obese (32 ± 3 yrs; n = 17) subjects. Differentially-expressed mitochondrial proteins in SS versus IMF mitochondria of obese subjects were associated with biological processes that regulate: electron transport chain (P<0.0001), citric acid cycle (P<0.0001), oxidative phosphorylation (P<0.001), branched-chain amino acid degradation, (P<0.0001), and fatty acid degradation (P<0.001). Overall, these findings show that obesity is associated with redistribution of key biological processes within the mitochondrial reticulum responsible for regulating energy metabolism in human skeletal muscle.
Date Created

Topological analysis of biological pathways : genes, microRNAs and pathways involved in hepatocellular carcinoma

155994-Thumbnail Image.png
Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired

Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of miRNAs on the topology of biological pathways between controls and cases. In the dissertation, it is discussed that how rewired biological pathways (Chapter 1) and/or rewired miRNA-mRNA interactions (Chapter 2) aberrantly influence the activity of biological pathways and their association with disease.

This dissertation proposes two PageRank-based analytical methods, Pathways of Topological Rank Analysis (PoTRA) and miR2Pathway, discussed in Chapter 1 and Chapter 2, respectively. PoTRA focuses on detecting pathways with an altered number of hub genes in corresponding pathways between two phenotypes. The basis for PoTRA is that the loss of connectivity is a common topological trait of cancer networks, as well as the prior knowledge that a normal biological network is a scale-free network whose degree distribution follows a power law where a small number of nodes are hubs and a large number of nodes are non-hubs. However, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the scale-free structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal samples. Hence, it is hypothesized that if the number of hub genes is different in a pathway between normal and cancer, this pathway might be involved in cancer. MiR2Pathway focuses on quantifying the differential effects of miRNAs on the activity of a biological pathway when miRNA-mRNA connections are altered from normal to disease and rank disease risk of rewired miRNA-mediated biological pathways. This dissertation explores how rewired gene-gene interactions and rewired miRNA-mRNA interactions lead to aberrant activity of biological pathways, and rank pathways for their disease risk. The two methods proposed here can be used to complement existing genomics analysis methods to facilitate the study of biological mechanisms behind disease at the systems-level.
Date Created

Next-Generation Sequencing for DNA Methylation Profiling in Blood and Skeletal Muscle

137400-Thumbnail Image.png
DNA methylation, a subset of epigenetics, has been found to be a significant marker associated with variations in gene expression and activity across the entire human genome. As of now, however, there is little to no information about how DNA

DNA methylation, a subset of epigenetics, has been found to be a significant marker associated with variations in gene expression and activity across the entire human genome. As of now, however, there is little to no information about how DNA methylation varies between different tissues inside a singular person's body. By using research data from a preliminary study of lean and obese clinical subjects, this study attempts to put together a profile of the differences in DNA methylation that can be observed between two particular body tissues from this subject group: blood and skeletal muscle. This study allows us to start describing the changes that occur at the epigenetic level that influence how differently these two tissues operate, along with seeing how these tissues change between individuals of different weight classes, especially in the context of the development of symptoms of Type 2 Diabetes.
Date Created

Novel methods of biomarker discovery and predictive modeling using Random Forest

155725-Thumbnail Image.png
Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.

Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.

Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.

Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.
Date Created

Investigation of DNA methylation in obesity and its underlying insulin resistance

Obesity and its underlying insulin resistance are caused by environmental and genetic factors. DNA methylation provides a mechanism by which environmental factors can regulate transcriptional activity. The overall goal of the work herein was to (1) identify alterations in DNA

Obesity and its underlying insulin resistance are caused by environmental and genetic factors. DNA methylation provides a mechanism by which environmental factors can regulate transcriptional activity. The overall goal of the work herein was to (1) identify alterations in DNA methylation in human skeletal muscle with obesity and its underlying insulin resistance, (2) to determine if these changes in methylation can be altered through weight-loss induced by bariatric surgery, and (3) to identify DNA methylation biomarkers in whole blood that can be used as a surrogate for skeletal muscle.

Assessment of DNA methylation was performed on human skeletal muscle and blood using reduced representation bisulfite sequencing (RRBS) for high-throughput identification and pyrosequencing for site-specific confirmation. Sorbin and SH3 homology domain 3 (SORBS3) was identified in skeletal muscle to be increased in methylation (+5.0 to +24.4 %) in the promoter and 5’untranslated region (UTR) in the obese participants (n= 10) compared to lean (n=12), and this finding corresponded with a decrease in gene expression (fold change: -1.9, P=0.0001). Furthermore, SORBS3 was demonstrated in a separate cohort of morbidly obese participants (n=7) undergoing weight-loss induced by surgery, to decrease in methylation (-5.6 to -24.2%) and increase in gene expression (fold change: +1.7; P=0.05) post-surgery. Moreover, SORBS3 promoter methylation was demonstrated in vitro to inhibit transcriptional activity (P=0.000003). The methylation and transcriptional changes for SORBS3 were significantly (P≤0.05) correlated with obesity measures and fasting insulin levels. SORBS3 was not identified in the blood methylation analysis of lean (n=10) and obese (n=10) participants suggesting that it is a muscle specific marker. However, solute carrier family 19 member 1 (SLC19A1) was identified in blood and skeletal muscle to have decreased 5’UTR methylation in obese participants, and this was significantly (P≤0.05) predicted by insulin sensitivity.

These findings suggest SLC19A1 as a potential blood-based biomarker for obese, insulin resistant states. The collective findings of SORBS3 DNA methylation and gene expression present an exciting novel target in skeletal muscle for further understanding obesity and its underlying insulin resistance. Moreover, the dynamic changes to SORBS3 in response to metabolic improvements and weight-loss induced by surgery.
Date Created

BitTorious: Global Controlled Genomics Data Publication, Research, and Archiving Via BitTorrent Extensions

128639-Thumbnail Image.png

Background: Centralized silos of genomic data are architecturally easier to initially design, develop and deploy than distributed models. However, as interoperability pains in EHR/EMR, HIE and other collaboration-centric life sciences domains have taught us, the core challenge of networking genomics systems

Background: Centralized silos of genomic data are architecturally easier to initially design, develop and deploy than distributed models. However, as interoperability pains in EHR/EMR, HIE and other collaboration-centric life sciences domains have taught us, the core challenge of networking genomics systems is not in the construction of individual silos, but the interoperability of those deployments in a manner embracing the heterogeneous needs, terms and infrastructure of collaborating parties. This article demonstrates the adaptation of BitTorrent to private collaboration networks in an authenticated, authorized and encrypted manner while retaining the same characteristics of standard BitTorrent.

Results: The BitTorious portal was sucessfully used to manage many concurrent domestic Bittorrent clients across the United States: exchanging genomics data payloads in excess of 500GiB using the uTorrent client software on Linux, OSX and Windows platforms. Individual nodes were sporadically interrupted to verify the resilience of the system to outages of a single client node as well as recovery of nodes resuming operation on intermittent Internet connections.

Conclusions: The authorization-based extension of Bittorrent and accompanying BitTorious reference tracker and user management web portal provide a free, standards-based, general purpose and extensible data distribution system for large ‘omics collaborations.

Date Created

BitTorious Volunteer: Server-Side Extensions for Centrally-Managed Volunteer Storage in BitTorrent Swarms

128640-Thumbnail Image.png

Background: Our publication of the BitTorious portal [1] demonstrated the ability to create a privatized distributed data warehouse of sufficient magnitude for real-world bioinformatics studies using minimal changes to the standard BitTorrent tracker protocol. In this second phase, we release a

Background: Our publication of the BitTorious portal [1] demonstrated the ability to create a privatized distributed data warehouse of sufficient magnitude for real-world bioinformatics studies using minimal changes to the standard BitTorrent tracker protocol. In this second phase, we release a new server-side specification to accept anonymous philantropic storage donations by the general public, wherein a small portion of each user’s local disk may be used for archival of scientific data. We have implementated the server-side announcement and control portions of this BitTorrent extension into v3.0.0 of the BitTorious portal, upon which compatible clients may be built.

Results: Automated test cases for the BitTorious Volunteer extensions have been added to the portal’s v3.0.0 release, supporting validation of the “peer affinity” concept and announcement protocol introduced by this specification. Additionally, a separate reference implementation of affinity calculation has been provided in C++ for informaticians wishing to integrate into libtorrent-based projects.

Conclusions: The BitTorrent “affinity” extensions as provided in the BitTorious portal reference implementation allow data publishers to crowdsource the extreme storage prerequisites for research in “big data” fields. With sufficient awareness and adoption of BitTorious Volunteer-based clients by the general public, the BitTorious portal may be able to provide peta-scale storage resources to the scientific community at relatively insignificant financial cost.

Date Created

Association of SNPs in EGR3 and ARC With Schizophrenia Supports a Biological Pathway for Schizophrenia Risk

128932-Thumbnail Image.png

We have previously hypothesized a biological pathway of activity-dependent synaptic plasticity proteins that addresses the dual genetic and environmental contributions to schizophrenia. Accordingly, variations in the immediate early gene EGR3, and its target ARC, should influence schizophrenia susceptibility. We used

We have previously hypothesized a biological pathway of activity-dependent synaptic plasticity proteins that addresses the dual genetic and environmental contributions to schizophrenia. Accordingly, variations in the immediate early gene EGR3, and its target ARC, should influence schizophrenia susceptibility. We used a pooled Next-Generation Sequencing approach to identify variants across these genes in U.S. populations of European (EU) and African (AA) descent. Three EGR3 and one ARC SNP were selected and genotyped for validation, and three SNPs were tested for association in a replication cohort. In the EU group of 386 schizophrenia cases and 150 controls EGR3 SNP rs1877670 and ARC SNP rs35900184 showed significant associations (p = 0.0078 and p = 0.0275, respectively). In the AA group of 185 cases and 50 controls, only the ARC SNP revealed significant association (p = 0.0448). The ARC SNP did not show association in the Han Chinese (CH) population. However, combining the EU, AA, and CH groups revealed a highly significant association of ARC SNP rs35900184 (p = 2.353 x 10-7; OR [95% CI] = 1.54 [1.310–1.820]). These findings support previously reported associations between EGR3 and schizophrenia. Moreover, this is the first report associating an ARC SNP with schizophrenia and supports recent large-scale GWAS findings implicating the ARC complex in schizophrenia risk. These results support the need for further investigation of the proposed pathway of environmentally responsive, synaptic plasticity-related, schizophrenia genes.

Date Created

Statistical Methods for Analyzing Immunosignatures

128958-Thumbnail Image.png

Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the

Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the complex patterns and latent factors underlying immunosignatures, which we attempt to model, may have a diverse range of applications.

Methods: We investigate the utility of a number of statistical methods to determine model performance and address challenges inherent in analyzing immunosignatures. Some of these methods include exploratory and confirmatory factor analyses, classical significance testing, structural equation and mixture modeling.

Results: We demonstrate an ability to classify samples based on disease status and show that immunosignaturing is a very promising technology for screening and presymptomatic screening of disease. In addition, we are able to model complex patterns and latent factors underlying immunosignatures. These latent factors may serve as biomarkers for disease and may play a key role in a bioinformatic method for antibody discovery.

Conclusion: Based on this research, we lay out an analytic framework illustrating how immunosignatures may be useful as a general method for screening and presymptomatic screening of disease as well as antibody discovery.

Date Created