Generalizing Under Distribution Shifts and Data Scarcity via Geometrical and Knowledge-Aware Deep Learning

This dissertation presents novel solutions for improving the generalization capabilities of deep learning based computer vision models. Neural networks are known to suffer a large drop in performance when tested on samples from a different distribution than the one on which they were trained. The proposed solutions, based on latent space geometry and meta-learning, address this issue by improving the robustness of these models to distribution shifts. Through the use of geometrical alignment, state-of-the-art domain adaptation and source-free test-time adaptation strategies are developed. Additionally, geometrical alignment can allow classifiers to be progressively adapted to new, unseen test domains without requiring retraining of the feature extractors. The dissertation also presents algorithms for enabling in-the-wild generalization without needing access to any samples from the target domain. Other causes of poor generalization, such as data scarcity in critical applications and training data with high levels of noise and variance, are also explored. To address data scarcity in fine-grained computer vision tasks such as object detection, novel context-aware augmentations are suggested. While the first four chapters focus on general-purpose computer vision models, strategies are also developed to improve robustness in specific applications. The efficiency of training autonomous agents for visual navigation is improved by incorporating semantic knowledge, and the integration of domain experts' knowledge allows for the realization of a low-cost, minimally invasive generalizable automated rehabilitation system. Lastly, new tools for explainability and model introspection using counter-factual explainers trained through interval-based uncertainty calibration objectives are presented.
Effective Prior Selection and Knowledge Transfer for Deep Learning Applications

In the recent years, deep learning has gained popularity for its ability to be utilized for several computer vision applications without any apriori knowledge. However, to introduce better inductive bias incorporating prior knowledge along with learnedinformation is critical. To that end, human intervention including choice of algorithm, data and model in deep learning pipelines can be considered a prior. Thus, it is extremely important to select effective priors for a given application. This dissertation explores different aspects of a deep learning pipeline and provides insights as to why a particular prior is effective for the corresponding application. For analyzing the effect of model priors, three applications which involvesequential modelling problems i.e. Audio Source Separation, Clinical Time-series (Electroencephalogram (EEG)/Electrocardiogram(ECG)) based Differential Diagnosis and Global Horizontal Irradiance Forecasting for Photovoltaic (PV) Applications are chosen. For data priors, the application of image classification is chosen and a new algorithm titled,“Invenio” that can effectively use data semantics for both task and distribution shift scenarios is proposed. Finally, the effectiveness of a data selection prior is shown using the application of object tracking wherein the aim is to maintain the tracking performance while prolonging the battery usage of image sensors by optimizing the data selected for reading from the environment. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed design choices demonstrate significant performance improvements in comparison to the existing application specific state-of-the-art deep learning strategies.
Modeling and Exploiting the Structure of Data via Meta-Features for Robust and Efficient Machine Learning

In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide which features to extract, which hypothesis class to condition on, which hyperparameters to select, and how to train the model. The design process is iterative with the developer trying different classifiers, feature sets, and hyper-parameters and using cross-validation to pick the model with the lowest error. As there are no guidelines for when to stop searching, developers can continue "optimizing" the model to the point where they begin to "fit to the dataset". These problems are amplified in the active learning setting, where the initial dataset may be unlabeled and label acquisition is costly. The aim in this dissertation is to develop algorithms that provide ML developers with additional information about the complexity of the underlying problem to guide downstream model development. I introduce the concept of "meta-features" - features extracted from a dataset that characterize the complexity of the underlying data generating process. In the context of classification, the complexity of the problem can be characterized by understanding two complementary meta-features: (a) the amount of overlap between classes, and (b) the geometry/topology of the decision boundary. Across three complementary works, I present a series of estimators for the meta-features that characterize overlap and geometry/topology of the decision boundary, and demonstrate how they can be used in algorithm development.
Addressing the Challenges of Automated Speech and Language Analysis for the Assessment of Mental Health and Functional Competency

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews and a set of neuropsychiatric batteries; a key component of nearly all of these evaluations is some spoken language task. Clinicians have long used speech and language production as a proxy for neurological health, but most of these assessments are subjective in nature. Meanwhile, technological advancements in speech and natural language processing have grown exponentially over the past decade, increasing the capacity of computer models to assess particular aspects of speech and language. For this reason, many have seen an opportunity to leverage signal processing and machine learning applications to objectively assess clinical speech samples in order to automatically compute objective measures of neurological health. This document summarizes several contributions to expand upon this body of research. Mainly, there is still a large gap between the theoretical power of computational language models and their actual use in clinical applications. One of the largest concerns is the limited and inconsistent reliability of speech and language features used in models for assessing specific aspects of mental health; numerous methods may exist to measure the same or similar constructs and lead researchers to different conclusions in different studies. To address this, a novel measurement model based on a theoretical framework of speech production is used to motivate feature selection, while also performing a smoothing operation on features across several domains of interest. Then, these composite features are used to perform a much wider range of analyses than is typical of previous studies, looking at everything from diagnosis to functional competency assessments. Lastly, potential improvements to address practical implementation challenges associated with the use of speech and language technology in a real-world environment are investigated. The goal of this work is to demonstrate the ability of speech and language technology to aid clinical practitioners toward improvements in quality of life outcomes for their patients.
A Wearable Real-Time Auditory Feedback System to Improve Gait and Posture in Parkinson’s Disease

Nearly one percent of the population over 65 years of age is living with Parkinson’s disease (PD) and this population worldwide is projected to be approximately nine million by 2030. PD is a progressive neurological disease characterized by both motor and cognitive impairments. One of the most serious challenges for an individual as the disease progresses is the increasing severity of gait and posture impairments since they result in debilitating conditions such as freezing of gait, increased likelihood of falls, and poor quality of life. Although dopaminergic therapy and deep brain stimulation are generally effective, they often fail to improve gait and posture deficits. Several recent studies have employed real-time feedback (RTF) of gait parameters to improve walking patterns in PD. In earlier work, results from the investigation of the effects of RTF of step length and back angle during treadmill walking demonstrated that people with PD could follow the feedback and utilize it to modulate movements favorably in a manner that transferred, at least acutely, to overground walking. In this work, recent advances in wearable technologies were leveraged to develop a wearable real-time feedback (WRTF) system that can monitor and evaluate movements and provide feedback during daily activities that involve overground walking. Specifically, this work addressed the challenges of obtaining accurate gait and posture measures from wearable sensors in real-time and providing auditory feedback on the calculated real-time measures for rehabilitation. An algorithm was developed to calculate gait and posture variables from wearable sensor measurements, which were then validated against gold-standard measurements. The WRTF system calculates these measures and provides auditory feedback in real-time. The WRTF system was evaluated as a potential rehabilitation tool for use by people with mild to moderate PD. Results from the study indicated that the system can accurately measure step length and back angle, and that subjects could respond to real-time auditory feedback in a manner that improved their step length and uprightness. These improvements were exhibited while using the system that provided feedback and were sustained in subsequent trials immediately thereafter in which subjects walked without receiving feedback from the system.
Analyzing Multi-viewpoint Capabilities of Light Estimation Frameworks for Augmented Reality Using TCP/IP and UDP

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s ARKit (Apple’s Augmented Reality Software Development Kit) are computationally expensive and have very slow refresh rates, which make them incompatible for dynamic environments and low-end mobile devices. Recently, there have been other illumination estimation frameworks such as GLEAM, Xihe, which aim at providing better illumination with faster refresh rates. GLEAM is an illumination estimation framework that understands the real scene by collecting pixel data from a reflecting spherical light probe. GLEAM uses this data to form environment cubemaps which are later mapped onto a reflection probe to generate illumination for AR objects. It is noticed that from a single viewpoint only one half of the light probe can be observed at a time which does not give complete information about the environment. This leads to the idea of having a multi-viewpoint estimation for better performance. This thesis work analyzes the multi-viewpoint capabilities of AR illumination frameworks that use physical light probes to understand the environment. The current work builds networking using TCP and UDP protocols on GLEAM. This thesis work also documents how processor load sharing has been done while networking devices and how that benefits the performance of GLEAM on mobile devices. Some enhancements using multi-threading have also been made to the already existing GLEAM model to improve its performance.
Recursive Bayesian Estimation on Projective Spaces: Theoretical Foundations and Practical Algorithms

This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric objects, e.g., tangent vectors, metrics, geodesics, volume forms. The first part of this thesis is devoted to a mathematical exposition of these. In particular, it leverages the classical work of Alan James to derive the exterior calculus of differential forms on special grassmannians for invariant measures with respect to which integration is permissible. Motivated by various multi-­sensor remote sensing applications, the second part of this thesis describes the problem of recursively estimating the state of a dynamical system propagating on the Grassmann manifold. Fundamental to the bayesian treatment of this problem is the choice of a suitable probability distribution to a priori model the state. Using the Method of Maximum Entropy, a derivation of maximum-­entropy probability distributions on the state space that uses the developed geometric theory is characterized. Statistical analyses of these distributions, including parameter estimation, are also presented. These probability distributions and the statistical analysis thereof are original contributions. Using the bayesian framework, two recursive estimation algorithms, both of which rely on noisy measurements on (special cases of) the Grassmann manifold, are the devised and implemented numerically. The first is applied to an idealized scenario, the second to a more practically motivated scenario. The novelty of both of these algorithms lies in the use of thederived maximum­entropy probability measures as models for the priors. Numerical simulations demonstrate that, under mild assumptions, both estimation algorithms produce accurate and statistically meaningful outputs. This thesis aims to chart the interface between differential geometry and statistical signal processing. It is my deepest hope that the geometric-statistical approach underlying this work facilitates and encourages the development of new theories and new computational methods in geometry. Application of these, in turn, will bring new insights and bettersolutions to a number of extant and emerging problems in signal processing.
Augnosis: Self-Diagnosis in Augmented Reality

Oftentimes, patients struggle to accurately describe their symptoms to medical professionals, which produces erroneous diagnoses, delaying and preventing treatment. My app, Augnosis, will streamline constructive communication between patient and doctor, and allow for more accurate diagnoses. The goal of this project was to create an app capable of gathering data on visual symptoms of facial acne and categorizing it to differentiate between diagnoses using image recognition and identification. “Augnosis”, is a combination of the words “Augmented Reality” and “Self-Diagnosis”, the former being the medium in which it is immersed and the latter detailing its functionality.
