Making the Best of What We Have: Novel Strategies for Training Neural Networks under Restricted Labeling Information

193841-Thumbnail Image.png
Description
Recent advancements in computer vision models have largely been driven by supervised training on labeled data. However, the process of labeling datasets remains both costly and time-intensive. This dissertation delves into enhancing the performance of deep neural networks when faced

Recent advancements in computer vision models have largely been driven by supervised training on labeled data. However, the process of labeling datasets remains both costly and time-intensive. This dissertation delves into enhancing the performance of deep neural networks when faced with limited or no labeling information. I address this challenge through four primary methodologies: domain adaptation, self-supervision, input regularization, and label regularization. In situations where labeled data is unavailable but a similar dataset exists, domain adaptation emerges as a valuable strategy for transferring knowledge from the labeled dataset to the target dataset. This dissertation introduces three innovative domain adaptation methods that operate at pixel, feature, and output levels.Another approach to tackle the absence of labels involves a novel self-supervision technique tailored to train Vision Transformers in extracting rich features. The third and fourth approaches focus on scenarios where only a limited amount of labeled data is available. In such cases, I present novel regularization techniques designed to mitigate overfitting by modifying the input data and the target labels, respectively.
Date Created
2024
Agent

Novel Deep Learning Algorithms for Enhancing Inference in Cross-Modal Applications

193491-Thumbnail Image.png
Description
With the exponential growth of multi-modal data in the field of computer vision, the ability to do inference effectively among multiple modalities—such as visual, textual, and auditory data—shows significant opportunities. The rapid development of cross-modal applications such as retrieval and

With the exponential growth of multi-modal data in the field of computer vision, the ability to do inference effectively among multiple modalities—such as visual, textual, and auditory data—shows significant opportunities. The rapid development of cross-modal applications such as retrieval and association is primarily attributed to their ability to bridge the gap between different modalities of data. However, the current mainstream cross-modal methods always heavily rely on the availability of fully annotated paired data, presenting a significant challenge due to the scarcity of precisely matched datasets in real-world scenarios. In response to this bottleneck, several sophisticated deep learning algorithms are designed to substantially improve the inference capabilities across a broad spectrum of cross-modal applications. This dissertation introduces novel deep learning algorithms aimed at enhancing inference capabilities in cross-modal applications, which take four primary aspects. Firstly, it introduces the algorithm for image retrieval by learning hashing codes. This algorithm only utilizes the other modality data in weakly supervised tags format rather than the supervised label. Secondly, it designs a novel framework for learning the joint embeddings of images and texts for the cross-modal retrieval tasks. It efficiently learns the binary codes from the continuous CLIP feature space and can even deliver competitive performance compared with the results from non-hashing methods. Thirdly, it conducts a method to learn the fragment-level embeddings that capture fine-grained cross-modal association in images and texts. This method uses the fragment proposals in an unsupervised manner. Lastly, this dissertation also outlines the algorithm to enhance the mask-text association ability of pre-trained semantic segmentation models with zero examples provided. Extensive future plans to further improve this algorithm for semantic segmentation tasks will be discussed.
Date Created
2024
Agent

Improving and Automating Machine Learning Model Compression

193384-Thumbnail Image.png
Description
Machine learning models are increasingly employed by smart devices on the edge to support important applications such as real-time virtual assistants and privacy-preserving healthcare. However, deploying state-of-the-art (SOTA) deep learning models on devices faces multiple serious challenges. First, it is

Machine learning models are increasingly employed by smart devices on the edge to support important applications such as real-time virtual assistants and privacy-preserving healthcare. However, deploying state-of-the-art (SOTA) deep learning models on devices faces multiple serious challenges. First, it is infeasible to deploy large models on resource-constrained edge devices whereas small models cannot achieve the SOTA accuracy. Second, it is difficult to customize the models according to diverse application requirements in accuracy and speed and diverse capabilities of edge devices. This study proposes several novel solutions to comprehensively address the above challenges through automated and improved model compression. First, it introduces Automatic Attention Pruning (AAP), an adaptive, attention-based pruning approach to automatically reduce model parameters while meeting diverse user objectives in model size, speed, and accuracy. AAP achieves an impressive 92.72% parameter reduction in ResNet-101 on Tiny-ImageNet without causing any accuracy loss. Second, it presents Self-Supervised Quantization-Aware Knowledge Distillation (SQAKD), a framework for reducing model precision without supervision from labeled training data. For example, it quantizes VGG-8 to 2 bits on CIFAR-10 without any accuracy loss. Finally, the study explores two more works, Contrastive Knowledge Distillation Framework (CKDF) and Log-Curriculum based Module Replacing (LCMR), for further improving the performance of small models. All the works proposed in this study are designed to address real-world challenges, and have been successfully deployed on diverse hardware platforms, including cloud instances and edge devices, catalyzing AI for the edge.
Date Created
2024
Agent

Towards Unsupervised Denoising of Magnetic Resonance Imaging Scans

193355-Thumbnail Image.png
Description
Image denoising, a fundamental task in computer vision, poses significant challenges due to its inherently inverse and ill-posed nature. Despite advancements in traditional methods and supervised learning approaches, particularly in medical imaging such as Medical Resonance Imaging (MRI) scans, the

Image denoising, a fundamental task in computer vision, poses significant challenges due to its inherently inverse and ill-posed nature. Despite advancements in traditional methods and supervised learning approaches, particularly in medical imaging such as Medical Resonance Imaging (MRI) scans, the reliance on paired datasets and known noise distributions remains a practical hurdle. Recent progress in noise statistical independence theory and diffusion models has revitalized research interest, offering promising avenues for unsupervised denoising. However, existing methods often yield overly smoothed results or introduce hallucinated structures, limiting their clinical applicability. This thesis tackles the core challenge of progressing towards unsupervised denoising of MRI scans. It aims to retain intricate details without smoothing or introducing artificial structures, thus ensuring the production of high-quality MRI images. The thesis makes a three-fold contribution: Firstly, it presents a detailed analysis of traditional techniques, early machine learning algorithms for denoising, and new statistical-based models, with an extensive evaluation study on self-supervised denoising methods highlighting their limitations. Secondly, it conducts an evaluation study on an emerging class of diffusion-based denoising methods, accompanied by additional empirical findings and discussions on their effectiveness and limitations, proposing solutions to enhance their utility. Lastly, it introduces a novel approach, Unsupervised Multi-stage Ensemble Deep Learning with diffusion models for denoising MRI scans (MEDL). Leveraging diffusion models, this approach operates independently of signal or noise priors and incorporates weighted rescaling of multi-stage reconstructions to balance over-smoothing and hallucination tendencies. Evaluation using benchmark datasets demonstrates an average gain of 1dB and 2% in PSNR and SSIM metrics, respectively, over existing approaches.
Date Created
2024
Agent

Deployable Web GUI for LLM Applications

Description
The scientific manuscript review stage is a key part of the modern scientific process. It involves rigorous evaluation of new papers by peers to assess the significance of contributions in a particular area of study and ensure that papers meet

The scientific manuscript review stage is a key part of the modern scientific process. It involves rigorous evaluation of new papers by peers to assess the significance of contributions in a particular area of study and ensure that papers meet high standards. This process helps maintain the quality and credibility of research. However, some reviews can be toxic or overly discouraging, leading to unintentional psychological damage (such as anxiety or depression) to paper authors and detracting from the constructive tone of the review space. This Thesis/Creative Project was completed alongside a capstone project. Our capstone project aims to address this issue. The goal is to fine tune a Large Language Model (LLM) that can first accurately identify toxic sentences within a paper review. Then, the LLM will revise any toxic sentences in a way that maintains the criticism but delivers it in a more friendly or encouraging tone. To effectively use this LLM, it requires a Graphical User Interface (GUI) so that end-users (such as editors, associate editors, reviewers) can easily interact with it. This allows them to update the wording of the review in an effective manner while maintaining scientific integrity. While the GUI provides a user-friendly interface for interacting with the LLM, there are some technical challenges in running a LLM application in a web-based framework. LLMs are computationally expensive to run. They require significant GPU RAM, which can be a limiting factor, especially in a web-based framework with limited resources. One potential solution to this problem is model quantization, which can reduce the memory footprint of the model. However, this introduces the problem of model drift, as the model’s performance may decrease when quantized. This needs to be measured to ensure the model continues to provide accurate results.
Date Created
2024-05
Agent

Virality in the Digital Age: Contextualization, Messaging Strategies, and Framing Detection

190777-Thumbnail Image.png
Description
Social networking platforms have redefined communication, serving as conduits forswift global information dissemination on contemporary topics and trends. This research probes information cascade (IC) dynamics, focusing on viral IC, where user-shared information gains rapid, widespread attention. Implications of IC span advertising, persuasion, opinion-shaping,

Social networking platforms have redefined communication, serving as conduits forswift global information dissemination on contemporary topics and trends. This research probes information cascade (IC) dynamics, focusing on viral IC, where user-shared information gains rapid, widespread attention. Implications of IC span advertising, persuasion, opinion-shaping, and crisis response. First, this dissertation aims to unravel the context behind viral content, particularly in the realm of the digital world, introducing a semi-supervised taxonomy induction framework (STIF). STIF employs state-of-the-art term representation, topical phrase detection, and clustering to organize terms into a two-level topic taxonomy. Social scientists then assess the topic clusters for coherence and completeness. STIF proves effective, significantly reducing human coding efforts (up to 74%) while accurately inducing taxonomies and term-to-topic mappings due to the high purity of its topics. Second, to profile the drivers of virality, this study investigates messaging strategies influencing message virality. Three content-based hypotheses are formulated and tested, demonstrating that incorporation of “negativity bias,” “causal arguments,” and “threats to personal or societal core values” - singularly and jointly - significantly enhances message virality on social media, quantified by retweet counts. Furthermore, the study highlights framing narratives’ pivotal role in shaping discourse, particularly in adversarial campaigns. An innovative pipeline for automatic framing detection is introduced, and tested on a collection of texts on the Russia-Ukraine conflict. Integrating representation learning, overlapping graph-clustering, and a unique Topic Actor Graph (TAG) synthesis method, the study achieves remarkable framing detection accuracy. The developed scoring mechanism maps sentences to automatically detect framing signatures. This pipeline attains an impressive F1 score of 92% and a 95% weighted accuracy for framing detection on a real-world dataset. In essence, this dissertation focuses on the multidimensional exploration of information cascade, uncovering the context and drivers of content virality, and automating framing detection. Through innovative methodologies like STIF, messaging strategy analysis, and TAG Frames, the research contributes valuable insights into the mechanics of viral content spread and framing nuances within the digital landscape, enriching fields such as advertisement, communication, public discourse, and crisis response strategies.
Date Created
2023
Agent

Knowledge Distillation with Geometric Approaches for Multimodal Data Analysis

190759-Thumbnail Image.png
Description
This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has

This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and wellness promotion, smart homes, and intelligent surveillance. In general, stacking more layers or increasing the number of trainable parameters causes deep networks to exhibit improved performance. However, this causes the model to become large, resulting in an additional need for computing and power resources for training, storage, and deployment. These are the core challenges in incorporating such models into small devices with limited power and computational resources. In this thesis, robust solutions aimed at addressing the aforementioned challenges are presented. These proposed methodologies and algorithmic contributions enhance the performance and efficiency of deep learning models. The thesis encompasses a comprehensive exploration of knowledge distillation, an approach that holds promise for creating compact models from high-capacity ones, while preserving their performance. This exploration covers diverse datasets, including both time series and image data, shedding light on the pivotal role of augmentation methods in knowledge distillation. The effects of these methods are rigorously examined through empirical experiments. Furthermore, the study within this thesis delves into the efficient utilization of features derived from two different teacher models, each trained on dissimilar data representations, including time-series and image data. Through these investigations, I present novel approaches to knowledge distillation, leveraging geometric techniques for the analysis of multimodal data. These solutions not only address real-world challenges but also offer valuable insights and recommendations for modeling in new applications.
Date Created
2023
Agent

Neuro-Symbolic AI Approaches to Enhance Deep Neural Networks with Logical Reasoning and Knowledge Integration

189394-Thumbnail Image.png
Description
One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are

One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are limited. Besides, deep learning approaches are usually data-hungry, hard to make use of explicit knowledge, and struggling with interpretability and justification. This dissertation presents three neuro-symbolic AI approaches that integrate neural networks (NNs) with symbolic AI methods to address these issues. The first approach presented in this dissertation is NeurASP, which combines NNs with Answer Set Programming (ASP), a logic programming formalism. NeurASP provides an effective way to integrate sub-symbolic and symbolic computation by treating NN outputs as probability distributions over atomic facts in ASP. The explicit knowledge encoded in ASP corrects mistakes in NN outputs and allows for better training with less data. To avoid NeurASP's bottleneck in symbolic computation, this dissertation presents a Constraint Loss via Straight-Through Estimators (CL-STE). CL-STE provides a systematic way to compile discrete logical constraints into a loss function over discretized NN outputs and scales significantly better than state-of-the-art neuro-symbolic methods. This dissertation also presents a finding when CL-STE was applied to Transformers. Transformers can be extended with recurrence to enhance its power for multi-step reasoning. Such Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. Lastly, this dissertation addresses the limitation of pre-trained Large Language Models (LLMs) on multi-step logical reasoning problems with a dual-process neuro-symbolic reasoning system called LLM+ASP, where an LLM (e.g., GPT-3) serves as a highly effective few-shot semantic parser that turns natural language sentences into a logical form that can be used as input to ASP. LLM+ASP achieves state-of-the-art performance on several textual reasoning benchmarks and can handle robot planning tasks that an LLM alone fails to solve.
Date Created
2023
Agent

Understanding the Effects of Orthogonal Convolution in Transfer Learning for Medical Image Analysis

187633-Thumbnail Image.png
Description
Insufficient training data poses significant challenges to training a deep convolutional neural network (CNN) to solve a target task. One common solution to this problem is to use transfer learning with pre-trained networks to apply knowledge learned from one domain

Insufficient training data poses significant challenges to training a deep convolutional neural network (CNN) to solve a target task. One common solution to this problem is to use transfer learning with pre-trained networks to apply knowledge learned from one domain with sufficient data to a new domain with limited data and avoid training a deep network from scratch. However, for such methods to work in a transfer learning setting, learned features from the source domain need to be generalizable to the target domain, which is not guaranteed since the feature space and distributions of the source and target data may be different. This thesis aims to explore and understand the use of orthogonal convolutional neural networks to improve learning of diverse, generic features that are transferable to a novel task. In this thesis, orthogonal regularization is used to pre-train deep CNNs to investigate if and how orthogonal convolution may improve feature extraction in transfer learning. Experiments using two limited medical image datasets in this thesis suggests that orthogonal regularization improves generality and reduces redundancy of learned features more effectively in certain deep networks for transfer learning. The results on feature selection and classification demonstrate the improvement in transferred features helps select more expressive features that improves generalization performance. To understand the effectiveness of orthogonal regularization on different architectures, this work studies the effects of residual learning on orthogonal convolution. Specifically, this work examines the presence of residual connections and its effects on feature similarities and show residual learning blocks help orthogonal convolution better preserve feature diversity across convolutional layers of a network and alleviate the increase in feature similarities caused by depth, demonstrating the importance of residual learning in making orthogonal convolution more effective.
Date Created
2023
Agent

Vision-inspired Representation and Learning for Data-driven Signal Processing

187459-Thumbnail Image.png
Description
In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public

In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public datasets and advances in deep learning, data-driven approaches in the computer vision domain have demonstrated superior performance with high adaptability on various data and tasks. Meanwhile, signal processing has long been dominated by techniques derived from rigorous mathematical models built upon prior knowledge of signals. Due to the lack of adaptability to real data and applications, model-based methods often suffer from performance degradation and engineering difficulties. In this dissertation, multiple signal processing problems are studied from vision-inspired data representation and learning perspectives to address the major limitation on adaptability. Corresponding data-driven solutions are proposed to achieve significantly improved performance over conventional solutions. Specifically, in the compressive sensing domain, an open-source image compressive sensing toolbox and benchmark to standardize the implementation and evaluation of reconstruction methods are first proposed. Then a plug-and-play compression ratio adapter is proposed to enable the adaptability of end-to-end data-driven reconstruction methods to variable compression ratios. Lastly, the problem of transfer learning from images to bioelectric signals is experimentally studied to demonstrate the improved performance of data-driven reconstruction. In the image subsampling domain, task-adaptive data-driven image subsampling is studied to reduce data redundancy and retain information of interest simultaneously. In the semiconductor analysis domain, the data-driven automatic error detection problem is studied in the context of integrated circuit segmentation for the first time. In the light detection and ranging(LiDAR) camera calibration domain, the calibration accuracy degradation problem in low-resolution LiDAR scenarios is addressed with data-driven techniques.
Date Created
2023
Agent