Making the Best of What We Have: Novel Strategies for Training Neural Networks under Restricted Labeling Information

193841-Thumbnail Image.png
Description
Recent advancements in computer vision models have largely been driven by supervised training on labeled data. However, the process of labeling datasets remains both costly and time-intensive. This dissertation delves into enhancing the performance of deep neural networks when faced

Recent advancements in computer vision models have largely been driven by supervised training on labeled data. However, the process of labeling datasets remains both costly and time-intensive. This dissertation delves into enhancing the performance of deep neural networks when faced with limited or no labeling information. I address this challenge through four primary methodologies: domain adaptation, self-supervision, input regularization, and label regularization. In situations where labeled data is unavailable but a similar dataset exists, domain adaptation emerges as a valuable strategy for transferring knowledge from the labeled dataset to the target dataset. This dissertation introduces three innovative domain adaptation methods that operate at pixel, feature, and output levels.Another approach to tackle the absence of labels involves a novel self-supervision technique tailored to train Vision Transformers in extracting rich features. The third and fourth approaches focus on scenarios where only a limited amount of labeled data is available. In such cases, I present novel regularization techniques designed to mitigate overfitting by modifying the input data and the target labels, respectively.
Date Created
2024
Agent

Monocular Visual Odometry: Deep Learning vs Classical Approaches

171832-Thumbnail Image.png
Description
Visual Odometry is one of the key aspects of robotic localization and mapping. Visual Odometry consists of many geometric-based approaches that convert visual data (images) into pose estimates of where the robot is in space. The classical geometric methods have

Visual Odometry is one of the key aspects of robotic localization and mapping. Visual Odometry consists of many geometric-based approaches that convert visual data (images) into pose estimates of where the robot is in space. The classical geometric methods have shown promising results; they are carefully crafted and built explicitly for these tasks. However, such geometric methods require extreme fine-tuning and extensive prior knowledge to set up these systems for different scenarios. Classical Geometric approaches also require significant post-processing and optimization to minimize the error between the estimated pose and the global truth. In this body of work, the deep learning model was formed by combining SuperPoint and SuperGlue. The resulting model does not require any prior fine-tuning. It has been trained to enable both outdoor and indoor settings. The proposed deep learning model is applied to the Karlsruhe Institute of Technology and Toyota Technological Institute dataset along with other classical geometric visual odometry models. The proposed deep learning model has not been trained on the Karlsruhe Institute of Technology and Toyota Technological Institute dataset. It is only during experimentation that the deep learning model is first introduced to the Karlsruhe Institute of Technology and Toyota Technological Institute dataset. Using the monocular grayscale images from the visual odometer files of the Karlsruhe Institute of Technology and Toyota Technological Institute dataset, through the experiment to test the viability of the models for different sequences. The experiment has been performed on eight different sequences and has obtained the Absolute Trajectory Error and the time taken for each sequence to finish the computation. From the obtained results, there are inferences drawn from the classical and deep learning approaches.
Date Created
2022
Agent

Analysis of Machine Learning Assisted Fatigue Identification in Radiology Readings

171646-Thumbnail Image.png
Description
Fatigue in radiology is a readily studied area. Machine learning concepts appliedto the identification of fatigue are also readily available. However, the intersection between the two areas is not a relative commonality. This study looks to explore the intersection of fatigue in

Fatigue in radiology is a readily studied area. Machine learning concepts appliedto the identification of fatigue are also readily available. However, the intersection between the two areas is not a relative commonality. This study looks to explore the intersection of fatigue in radiology and machine learning concepts by analyzing temporal trends in multivariate time series data. A novel methodological approach using support vector machines to observe temporal trends in time-based aggregations of time series data is proposed. The data used in the study is captured in a real-world, unconstrained radiology setting where gaze and facial metrics are captured from radiologists performing live image reviews. The captured data is formatted into classes whose labels represent a window of time during the radiologist’s review. Using the labeled classes, the decision function and accuracy of trained, linear support vector machine models are evaluated to produce a visualization of temporal trends and critical inflection points as well as the contribution of individual features. Consequently, the study finds valid potential justification in the methods suggested. The study offers a prospective use of maximummargin classification to demarcate the manipulation of an abstract phenomenon such as fatigue on temporal data. Potential applications are envisioned that could improve the workload distribution of the medical act.
Date Created
2022
Agent

Adversarial Machine Learning for Recommendation Systems

168538-Thumbnail Image.png
Description
Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. This thesis introduces a novel representation for user-vectors

Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. This thesis introduces a novel representation for user-vectors by combining user demographics and user preferences, making the model a hybrid system which uses Collaborative Filtering and Content Based Recommendation. This system models user purchase behavior using weighted user-product preferences (explicit feedback) rather than binary user-product interactions (implicit feedback). Using this a novel sparse adversarial model, Sparse ReguLarized Generative Adversarial Network (SRLGAN), is developed for Cold-Start Recommendation. SRLGAN leverages the sparse user-purchase behavior which ensures training stability and avoids over-fitting on warm users. The performance of SRLGAN is evaluated on two popular datasets and demonstrate state-of-the-art results.
Date Created
2022
Agent

Recognizing Compositional Actions in Videos with Temporal Ordering

168522-Thumbnail Image.png
Description
In some scenarios, true temporal ordering is required to identify the actions occurring in a video. Recently a new synthetic dataset named CATER, was introduced containing 3D objects like sphere, cone, cylinder etc. which undergo simple movements such as slide,

In some scenarios, true temporal ordering is required to identify the actions occurring in a video. Recently a new synthetic dataset named CATER, was introduced containing 3D objects like sphere, cone, cylinder etc. which undergo simple movements such as slide, pick & place etc. The task defined in the dataset is to identify compositional actions with temporal ordering. In this thesis, a rule-based system and a window-based technique are proposed to identify individual actions (atomic) and multiple actions with temporal ordering (composite) on the CATER dataset. The rule-based system proposed here is a heuristic algorithm that evaluates the magnitude and direction of object movement across frames to determine the atomic action temporal windows and uses these windows to predict the composite actions in the videos. The performance of the rule-based system is validated using the frame-level object coordinates provided in the dataset and it outperforms the performance of the baseline models on the CATER dataset. A window-based training technique is proposed for identifying composite actions in the videos. A pre-trained deep neural network (I3D model) is used as a base network for action recognition. During inference, non-overlapping windows are passed through the I3D network to obtain the atomic action predictions and the predictions are passed through a rule-based system to determine the composite actions. The approach outperforms the state-of-the-art composite action recognition models by 13.37% (mAP 66.47% vs. mAP 53.1%).
Date Created
2022
Agent

Application of Deep Learning Techniques for EEG Signal Classification

168404-Thumbnail Image.png
Description
Communicating with computers through thought has been a remarkable achievement in recent years. This was made possible by the use of Electroencephalography (EEG). Brain-computer interface (BCI) relies heavily on Electroencephalography (EEG) signals for communication between humans and computers. With the

Communicating with computers through thought has been a remarkable achievement in recent years. This was made possible by the use of Electroencephalography (EEG). Brain-computer interface (BCI) relies heavily on Electroencephalography (EEG) signals for communication between humans and computers. With the advent ofdeep learning, many studies recently applied these techniques to EEG data to perform various tasks like emotion recognition, motor imagery classification, sleep analysis, and many more. Despite the rise of interest in EEG signal classification, very few studies have explored the MindBigData dataset, which collects EEG signals recorded at the stimulus of seeing a digit and thinking about it. This dataset takes us closer to realizing the idea of mind-reading or communication via thought. Thus classifying these signals into the respective digit that the user thinks about is a challenging task. This serves as a motivation to study this dataset and apply existing deep learning techniques to study it. Given the recent success of transformer architecture in different domains like Computer Vision and Natural language processing, this thesis studies transformer architecture for EEG signal classification. Also, it explores other deep learning techniques for the same. As a result, the proposed classification pipeline achieves comparable performance with the existing methods.
Date Created
2021
Agent

Effect of Image Captioning with Description on the Working Memory

161949-Thumbnail Image.png
Description
Working memory plays an important role in human activities across academic,professional, and social settings. Working memory is dened as the memory extensively involved in goal-directed behaviors in which information must be retained and manipulated to ensure successful task execution. The aim of

Working memory plays an important role in human activities across academic,professional, and social settings. Working memory is dened as the memory extensively involved in goal-directed behaviors in which information must be retained and manipulated to ensure successful task execution. The aim of this research is to understand the effect of image captioning with image description on an individual's working memory. A study was conducted with eight neutral images comprising situations relatable to daily life such that each image could have a positive or negative description associated with the outcome of the situation in the image. The study consisted of three rounds where the first and second round involved two parts and the third round consisted of one part. The image was captioned a total of five times across the entire study. The findings highlighted that only 25% of participants were able to recall the captions which they captioned for an image after a span of 9-15 days; when comparing the recall rate of the captions, 50% of participants were able to recall the image caption from the previous round in the present round; and out of the positive and negative description associated with the image, 65% of participants recalled the former description rather than the latter. The conclusions drawn from the study are participants tend to retain information for longer periods than the expected duration for working memory, which may be because participants were able to relate the images with their everyday life situations and given a situation with positive and negative information, the human brain is aligned towards positive information over negative information.
Date Created
2021
Agent

Haptic Vision: Augmenting Non-visual Travel Tools, Techniques, and Methods by Increasing Spatial Knowledge Through Dynamic Haptic Interactions

158792-Thumbnail Image.png
Description
Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the

Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the high resolution information the eyes are able to receive from afar. As a sensory organ in particular, the eyes have an unparalleled ability to adjust to varying degrees of light, color, and distance. Therefore, in the case of a non-visual traveler, someone who is blind or low vision, access to visual information is unattainable if it is positioned beyond the reach of the preferred mobility device or outside the path of travel. Although, the area of assistive technology in terms of electronic travel aids (ETA’s) has received considerable attention over the last two decades; surprisingly, the field has seen little work in the area focused on augmenting rather than replacing current non-visual travel techniques, methods, and tools. Consequently, this work describes the design of an intuitive tactile language and series of wearable tactile interfaces (the Haptic Chair, HaptWrap, and HapBack) to deliver real-time spatiotemporal data. The overall intuitiveness of the haptic mappings conveyed through the tactile interfaces are evaluated using a combination of absolute identification accuracy of a series of patterns and subjective feedback through post-experiment surveys. Two types of spatiotemporal representations are considered: static patterns representing object location at a single time instance, and dynamic patterns, added in the HaptWrap, which represent object movement over a time interval. Results support the viability of multi-dimensional haptics applied to the body to yield an intuitive understanding of dynamic interactions occurring around the navigator during travel. Lastly, it is important to point out that the guiding principle of this work centered on providing the navigator with spatial knowledge otherwise unattainable through current mobility techniques, methods, and tools, thus, providing the \emph{navigator} with the information necessary to make informed navigation decisions independently, at a distance.
Date Created
2020
Agent

Characterizing Dysarthric Speech with Transfer Learning

158318-Thumbnail Image.png
Description
Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual

Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual evaluation by trained speech and language pathologists. Hypernasality, the result of poor control of the velopharyngeal flap---the soft palate regulating airflow between the oral and nasal cavities---is one such speech symptom of interest, as precise velopharyngeal control is difficult to achieve under neuromuscular disorders. However, a host of co-modulating variables give hypernasal speech a complex and highly variable acoustic signature, making it difficult for skilled clinicians to assess and for automated systems to evaluate. Previous work in rating hypernasality from speech relies on either engineered features based on statistical signal processing or machine learning models trained end-to-end on clinical ratings of disordered speech examples. Engineered features often fail to capture the complex acoustic patterns associated with hypernasality, while end-to-end methods tend to overfit to the small datasets on which they are trained. In this thesis, I present a set of acoustic features, models, and strategies for characterizing hypernasality in dysarthric speech that split the difference between these two approaches, with the aim of capturing the complex perceptual character of hypernasality without overfitting to the small datasets available. The features are based on acoustic models trained on a large corpus of healthy speech, integrating expert knowledge to capture known perceptual characteristics of hypernasal speech. They are then used in relatively simple linear models to predict clinician hypernasality scores. These simple models are robust, generalizing across diseases and outperforming comprehensive set of baselines in accuracy and correlation. This novel approach represents a new state-of-the-art in objective hypernasality assessment.
Date Created
2020
Agent

Generalized Domain Adaptation for Visual Domains

158278-Thumbnail Image.png
Description
Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of

Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain gap. Many factors such as image background, image resolution, color, camera perspective and variations in the objects are responsible for the domain gap between the training data (source domain) and testing data (target domain). Domain adaptation algorithms aim to overcome the domain gap between the source and target domains and learn robust models that can perform well across both the domains.

This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation.

The models were tested across multiple computer vision datasets for domain adaptation.

The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation.
Date Created
2020
Agent