Representation Learning for Trustworthy AI

187381-Thumbnail Image.png
Description
Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL)

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and evaluate such models with respect to privacy, fairness, and robustness. Recent examination of DL models reveals that representations may include information that could lead to privacy violations, unfairness, and robustness issues. This results in AI systems that are potentially untrustworthy from a socio-technical standpoint. Trustworthiness in AI is defined by a set of model properties such as non-discriminatory bias, protection of users’ sensitive attributes, and lawful decision-making. The characteristics of trustworthy AI can be grouped into three categories: Reliability, Resiliency, and Responsibility. Past research has shown that the successful integration of an AI model depends on its trustworthiness. Thus it is crucial for organizations and researchers to build trustworthy AI systems to facilitate the seamless integration and adoption of intelligent technologies. The main issue with existing AI systems is that they are primarily trained to improve technical measures such as accuracy on a specific task but are not considerate of socio-technical measures. The aim of this dissertation is to propose methods for improving the trustworthiness of AI systems through representation learning. DL models’ representations contain information about a given input and can be used for tasks such as detecting fake news on social media or predicting the sentiment of a review. The findings of this dissertation significantly expand the scope of trustworthy AI research and establish a new paradigm for modifying data representations to balance between properties of trustworthy AI. Specifically, this research investigates multiple techniques such as reinforcement learning for understanding trustworthiness in users’ privacy, fairness, and robustness in classification tasks like cyberbullying detection and fake news detection. Since most social measures in trustworthy AI cannot be used to fine-tune or train an AI model directly, the main contribution of this dissertation lies in using reinforcement learning to alter an AI system’s behavior based on non-differentiable social measures.
Date Created
2023
Agent

It’s a TAD Complicated: Detecting Genomic Structural Alterations Using TAD Delineated Gene Expression Data

171770-Thumbnail Image.png
Description
ABSTRACT Genomes are biologically complex entities where an alteration in structure can yield no effect, or have a devastating effect on many pathways. Most of the focus has been on translocations that generate fusion proteins. However, this is only one

ABSTRACT Genomes are biologically complex entities where an alteration in structure can yield no effect, or have a devastating effect on many pathways. Most of the focus has been on translocations that generate fusion proteins. However, this is only one of many outcomes. Recent work suggests alterations in topologically associated domains (TADs) can lead to changes in gene expression. It is hypothesized that alterations in genome structure can disrupt TADs leading to an alteration in the variability of gene expression within the contained gene expression neighborhood defined by the TAD. To test this hypothesis, variability of gene expression for genes contained within TADs between 37 cancer cell lines from the NCI-60 cell line panel was compared with normal expression data for the corresponding tissues of origin. Those results were correlated with the data on structural events within the NCI-60 cell lines that would disrupt a TAD. It was observed that 2.4% of the TADs displayed altered variance in gene expression when comparing cancer to normal tissue. Using array CGH data from the cancer cell lines to map breakpoints within TADS, it was discovered that altered variance is always associated with a TAD disrupted by a breakpoint, but a breakpoint within a TAD does not always lead to altered variance. TADs with altered variance in gene expression were no different in size than those without altered variance. There is evidence of recurrent pan-cancer alteration in variance for eleven genes within two TADs on two chromosomes (Chromosome 10 & 19) for all 37 cell lines. The genes located within these TADs are enriched in pathways related to RNA processing. This study supports altered variance as a signal of a breakpoint with a functional consequence.
Date Created
2022
Agent

Gender Variability in Latent Fingermark Degradation Studies

Description
Fingermarks have been used by law enforcement agencies to identify suspects in criminal activity. Although fingermarks remain persistent over time, the degradation pattern of latent fingermarks remains unknown. Previous studies examined the morphology of friction ridges on a two-dimensional scale,

Fingermarks have been used by law enforcement agencies to identify suspects in criminal activity. Although fingermarks remain persistent over time, the degradation pattern of latent fingermarks remains unknown. Previous studies examined the morphology of friction ridges on a two-dimensional scale, but recently 3D technology has been employed to examine how the height dimension degrades overtime. The Sa statistic was formulated to monitor the aging process of friction ridge heights from 6 donors. Fingermarks were deposited on two nonporous substrates (glass or plastic) and aged under dark or light exposure for 98 days. Pressure, time of contact, and treatment of finger prior to deposition were held constant while temperature and humidity were monitored throughout the study. Experimental variables included substrate and light exposure. Females exhibited slower degradation than males. For fingermarks deposited on glass, faster degradation was seen under light exposure. This finding was consistent for fingermarks deposited on plastic, but instrument contamination may have been possible. Slower degradation was seen on glass under both light exposures. This study indicates the Sa statistic is valuable for assessing fingermark degradation of friction ridges. However, due to a small sample size and variability in the rate of degradation between donors, genders, under different lighting and substrate conditions, the age of latent fingermarks cannot be determined at this time.
Date Created
2020-05
Agent

Reproducibility and Repeatability Experiment with Nested Factors in Fingerprint Age Analysis

132655-Thumbnail Image.png
Description
Gage reproducibility and repeatability methods do not account for a mix of random and fixed effects, nested factors, and repeated measures. Using a case study in fingerprint analysis, we propose a new method using linear mixed effects models to determine

Gage reproducibility and repeatability methods do not account for a mix of random and fixed effects, nested factors, and repeated measures. Using a case study in fingerprint analysis, we propose a new method using linear mixed effects models to determine the decomposition of the variation components in a measurement system. The fingerprint analysis tests whether the measuring system for ridge widths is reproducible and repeatable. Using the new model and traditional measurement systems analysis metrics, we found that the current process to measure ridge widths is not adequate. Further, we discovered that it is possible to use a linear mixed model to decompose the variance of a measurement system.
Date Created
2019-05
Agent

Emergence of New Technology and Statistical Analysis to Explore Aging Patterns in Latent Fingerprint Analysis

133705-Thumbnail Image.png
Description
Abstract Latent fingerprints are a critical component of the evidence that is captured and analyzed from crime scenes and presented for convictions in court. Although fingerprint science has been used for many years in forensics, it is not without many

Abstract Latent fingerprints are a critical component of the evidence that is captured and analyzed from crime scenes and presented for convictions in court. Although fingerprint science has been used for many years in forensics, it is not without many criticisms and critiques from those that believe it is too subjective. Researchers from many disciplines have tried to refute this claim by completing experiments that would eventually lead to a fingerprint aging technique as well as providing statistical models and mathematical support. In this literature review, the research that has been widely published and talked about in this field was reviewed and analyzed to determine what aspects of the experiments are benefitting the study of degradation. By carefully combing through the methods and results of each study, it can be determined where future focuses should be and what disciplines need to be exploited for knowledge. Lastly, an important aspect of the experiments in recent years have depended on the collaboration with statistics so this evidence was examined to identify what models are realistic in determining error rates and likelihood ratios to support latent fingerprint evidence in court. After a thorough review, it is seen that although large strides have been taken to study the degradation of fingerprints, the day where fingerprints will be able to be definitively aged may be ways away. The current experiments have provided methods such as three-dimensional and visual parameters that could potentially find the solution, but also uncovered methods such as immunolabeling and chemical composition that face major challenges. From the statistically point of view, researchers are very close in developing equations that exploit the likelihood ratios of similarity and even calculate the various possible error rates. The evidence found in this review shows that science is one step closer to the age determination of fingerprints.
Date Created
2018-05
Agent

Classication for Conservation: A Random Forest Model to Predict Threatened Marine Species

133732-Thumbnail Image.png
Description
As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to

As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to preserve the sanctity of terrestrial and marine life. The IUCN Red List of Threatened Species informs the conservation activities of governments as a world standard of species' risks of extinction. However, the IUCN's current methodology is, in some ways, inefficient given the immense volume of Earth's species and the laboriousness of its species' risk classification process. IUCN assessors can take years to classify a species' extinction risk, even as that species continues to decline. Therefore, to supplement the IUCN's classification process and thus bolster conservationist efforts for threatened species, a Random Forest model was constructed, trained on a group of fish species previously classified by the IUCN Red List. This Random Forest model both validates the IUCN Red List's classification method and offers a highly efficient, supplemental classification method for species' extinction risk. In addition, this Random Forest model is applicable to species with deficient data, which the IUCN Red List is otherwise unable to classify, thus engendering conservationist efforts for previously obscure species. Although this Random Forest model is built specifically for the trained fish species (Sparidae), the methodology can and should be extended to additional species.
Date Created
2018-05
Agent