The poor spectral and temporal resolution of cochlear implants (CIs) limit their users’ music enjoyment. Remixing music by boosting vocals while attenuating spectrally complex instruments has been shown to benefit music enjoyment of postlingually deaf CI users. However, the effectiveness…
The poor spectral and temporal resolution of cochlear implants (CIs) limit their users’ music enjoyment. Remixing music by boosting vocals while attenuating spectrally complex instruments has been shown to benefit music enjoyment of postlingually deaf CI users. However, the effectiveness of music remixing in prelingually deaf CI users is still unknown. This study compared the music-remixing preferences of nine postlingually deaf, late-implanted CI users and seven prelingually deaf, early-implanted CI users, as well as their ratings of song familiarity and vocal pleasantness. Twelve songs were selected from the most streamed tracks on Spotify for testing. There were six remixed versions of each song: Original, Music-6 (6-dB attenuation of all instruments), Music-12 (12-dB attenuation of all instruments), Music-3-3-12 (3-dB attenuation of bass and drums and 12-dB attenuation of other instruments), Vocals-6 (6-dB attenuation of vocals), and Vocals-12 (12-dB attenuation of vocals). It was found that the prelingual group preferred the Music-6 and Original versions over the other versions, while the postlingual group preferred the Vocals-12 version over the Music-12 version. The prelingual group was more familiar with the songs than the postlingual group. However, the song familiarity rating did not significantly affect the patterns of preference ratings in each group. The prelingual group also had higher vocal pleasantness ratings than the postlingual group. For the prelingual group, higher vocal pleasantness led to higher preference ratings for the Music-12 version. For the postlingual group, their overall preference for the Vocals-12 version was driven by their preference ratings for songs with very unpleasant vocals. These results suggest that the patient factor of auditory experience and stimulus factor of vocal pleasantness may affect the music-remixing preferences of CI users. As such, the music-remixing strategy needs to be customized for individual patients and songs.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
This study aimed to evaluate the efficacy of Apple AirPods pro (2nd generation) Live Listen feature in enhancing word recognition and memory retention among individuals with varying degrees of hearing loss, as determined by their Signal-to-Noise Ratio (SNR) loss. Utilizing…
This study aimed to evaluate the efficacy of Apple AirPods pro (2nd generation) Live Listen feature in enhancing word recognition and memory retention among individuals with varying degrees of hearing loss, as determined by their Signal-to-Noise Ratio (SNR) loss. Utilizing a single-group experimental design, the research measured participants' performance on word recognition and memory retention tasks with and without the Live Listen feature. Statistical analysis, including paired t-tests and linear regression, revealed significant improvements in word recognition (from 81.8% to 94.4%) and memory retention (from 43.8% to 59.4%) scores when the Live Listen feature was activated. Moreover, a positive correlation between SNR loss and recognition score improvements suggested a greater benefit for those with higher levels of hearing loss. However, the relationship with memory retention improvements was less pronounced. These findings underscore the potential of the Live Listen feature as an effective assistive listening device, highlighting its importance in enhancing auditory experiences for individuals with hearing impairments and encouraging further research into personalized auditory assistance technologies in noisy healthcare environments.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability…
This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely capturing the complexity of GRN interactions through the introduction of time-varying kinetic order parameters, while allowing for variability in multiple model parameters. This model is used as the drift function in the development of several stochastic GRN mod- els based on Langevin dynamics. Six models are introduced which capture intrinsic and extrinsic noise in GRNs, thereby providing a full characterization of a stochastic regulatory system. A Bayesian hierarchical approach is developed for learning the Langevin model which best describes the noise dynamics at each time step. The trajectory of the state, which are the gene expression values, as well as the indicator corresponding to the correct noise model are estimated via sequential Monte Carlo (SMC) with a high degree of accuracy. To address the problem of time-varying regulatory interactions, a Bayesian hierarchical model is introduced for learning variation in switching GRN architectures with unknown measurement noise covariance. The trajectory of the state and the indicator corresponding to the network configuration at each time point are estimated using SMC. This work is extended to a fully Bayesian hierarchical model to account for uncertainty in the process noise covariance associated with each network architecture. An SMC algorithm with local Gibbs sampling is developed to estimate the trajectory of the state and the indicator correspond- ing to the network configuration at each time point with a high degree of accuracy. The results demonstrate the efficacy of Bayesian methods for learning information in switching nonlinear GRNs.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the…
Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Generative models are deep neural network-based models trained to learn the underlying distribution of a dataset. Once trained, these models can be used to sample novel data points from this distribution. Their impressive capabilities have been manifested in various generative…
Generative models are deep neural network-based models trained to learn the underlying distribution of a dataset. Once trained, these models can be used to sample novel data points from this distribution. Their impressive capabilities have been manifested in various generative tasks, encompassing areas like image-to-image translation, style transfer, image editing, and more. One notable application of generative models is data augmentation, aimed at expanding and diversifying the training dataset to augment the performance of deep learning models for a downstream task. Generative models can be used to create new samples similar to the original data but with different variations and properties that are difficult to capture with traditional data augmentation techniques. However, the quality, diversity, and controllability of the shape and structure of the generated samples from these models are often directly proportional to the size and diversity of the training dataset. A more extensive and diverse training dataset allows the generative model to capture overall structures present in the data and generate more diverse and realistic-looking samples.
In this dissertation, I present innovative methods designed to enhance the robustness and controllability of generative models, drawing upon physics-based, probabilistic, and geometric techniques. These methods help improve the generalization and controllability of the generative model without necessarily relying on large training datasets. I enhance the robustness of generative models by integrating classical geometric moments for shape awareness and minimizing trainable parameters. Additionally, I employ non-parametric priors for the generative model's latent space through basic probability and optimization methods to improve the fidelity of interpolated images. I adopt a hybrid approach to address domain-specific challenges with limited data and controllability, combining physics-based rendering with generative models for more realistic results.
These approaches are particularly relevant in industrial settings, where the training datasets are small and class imbalance is common. Through extensive experiments on various datasets, I demonstrate the effectiveness of the proposed methods over conventional approaches.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Stress, depression, and anxiety are prevailing mental health issues that affect individuals worldwide. As the search for effective solutions continues, advancements in technology have led to the development of digital tools for stress identification and management purposes. The Cigna StressWaves…
Stress, depression, and anxiety are prevailing mental health issues that affect individuals worldwide. As the search for effective solutions continues, advancements in technology have led to the development of digital tools for stress identification and management purposes. The Cigna StressWaves Test (CSWT) is a publicly available stress analysis toolkit that claims to use “clinical-grade” artificial intelligence (AI) technology to evaluate individual stress levels through speech. To investigate their claim, this research stands as an independent validation study involving 60 participants over the age of 18. The primary objective of the study is to assess the repeatability and efficacy of the CSWT as a stress measurement tool. Key results indicate the CSWT lacks test-retest reliability and convergent validity. This implies that the CWST is not a repeatable tool and does not provide similar stress outcomes relative to an established measure of stress, the Perceived Stress Scale (PSS). These findings cast doubt on the accuracy and effectiveness of the CWST as a stress assessment tool. The public availability of the CSWT and the claim that it is a “clinical-grade” tool highlights concerns regarding premature deployment of digital health tools for stress management.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
This dissertation explores applications of machine learning methods in service of the design of screening tests, which are ubiquitous in applications from social work, to criminology, to healthcare. In the first part, a novel Bayesian decision theory framework is presented…
This dissertation explores applications of machine learning methods in service of the design of screening tests, which are ubiquitous in applications from social work, to criminology, to healthcare. In the first part, a novel Bayesian decision theory framework is presented for designing tree-based adaptive tests. On an application to youth delinquency in Honduras, the method produces a 15-item instrument that is almost as accurate as a full-length 150+ item test. The framework includes specific considerations for the context in which the test will be administered, and provides uncertainty quantification around the trade-offs of shortening lengthy tests. In the second part, classification complexity is explored via theoretical and empirical results from statistical learning theory, information theory, and empirical data complexity measures. A simulation study that explicitly controls two key aspects of classification complexity is performed to relate the theoretical and empirical approaches. Throughout, a unified language and notation that formalizes classification complexity is developed; this same notation is used in subsequent chapters to discuss classification complexity in the context of a speech-based screening test. In the final part, the relative merits of task and feature engineering when designing a speech-based cognitive screening test are explored. Through an extensive classification analysis on a clinical speech dataset from patients with normal cognition and Alzheimer’s disease, the speech elicitation task is shown to have a large impact on test accuracy; carefully performed task and feature engineering are required for best results. A new framework for objectively quantifying speech elicitation tasks is introduced, and two methods are proposed for automatically extracting insights into the aspects of the speech elicitation task that are driving classification performance. The dissertation closes with recommendations for how to evaluate the obtained insights and use them to guide future design of speech-based screening tests.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain…
The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target distributions and (iv) belief on existing metrics as reliable indicators of performance. When any of these assumptions are violated, the models exhibit brittleness producing adversely varied behavior. This dissertation focuses on methods for accurate model design and characterization that enhance process reliability when certain assumptions are not met. With the need to safely adopt artificial intelligence tools in practice, it is vital to build reliable failure detectors that indicate regimes where the model must not be invoked. To that end, an error predictor trained with a self-calibration objective is developed to estimate loss consistent with the underlying model. The properties of the error predictor are described and their utility in supporting introspection via feature importances and counterfactual explanations is elucidated. While such an approach can signal data regime changes, it is critical to calibrate models using regimes of inlier (training) and outlier data to prevent under- and over-generalization in models i.e., incorrectly identifying inliers as outliers and vice-versa. By identifying the space for specifying inliers and outliers, an anomaly detector that can effectively flag data of varying semantic complexities in medical imaging is next developed. Uncertainty quantification in deep learning models involves identifying sources of failure and characterizing model confidence to enable actionability. A training strategy is developed that allows the accurate estimation of model uncertainties and its benefits are demonstrated for active learning and generalization gap prediction. This helps identify insufficiently sampled regimes and representation insufficiency in models. In addition, the task of deep inversion under data scarce scenarios is considered, which in practice requires a prior to control the optimization. By identifying limitations in existing work, data priors powered by generative models and deep model priors are designed for audio restoration. With relevant empirical studies on a variety of benchmarks, the need for such design strategies is demonstrated.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
In this thesis, six experiments which were computer simulations were conducted in order to replicate the negative association between sample size and accuracy that is repeatedly found in ML literature by accounting for data leakage and publication bias. The reason…
In this thesis, six experiments which were computer simulations were conducted in order to replicate the negative association between sample size and accuracy that is repeatedly found in ML literature by accounting for data leakage and publication bias. The reason why it is critical to understand why this negative association is occurring is that in published studies, there have been multiple reports that the accuracies in ML models are overoptimistic leading to cases where the results are irreproducible despite conducting multiple trials and experiments. Additionally, after replicating the negative association between sample size and accuracy, parametric curves (learning curves with the parametric function) were fitted along the empirical learning curves in order to evaluate the performance. It was found that there is a significant variance in accuracies when the sample size is small, but little to no variation when the sample size is large. In other words, the empirical learning curves with data leakage and publication bias were able to achieve the same accuracy as the learning curve without data leakage at a large sample size.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the…
In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide which features to extract, which hypothesis class to condition on, which hyperparameters to select, and how to train the model. The design process is iterative with the developer trying different classifiers, feature sets, and hyper-parameters and using cross-validation to pick the model with the lowest error. As there are no guidelines for when to stop searching, developers can continue "optimizing" the model to the point where they begin to "fit to the dataset". These problems are amplified in the active learning setting, where the initial dataset may be unlabeled and label acquisition is costly. The aim in this dissertation is to develop algorithms that provide ML developers with additional information about the complexity of the underlying problem to guide downstream model development. I introduce the concept of "meta-features" - features extracted from a dataset that characterize the complexity of the underlying data generating process. In the context of classification, the complexity of the problem can be characterized by understanding two complementary meta-features: (a) the amount of overlap between classes, and (b) the geometry/topology of the decision boundary. Across three complementary works, I present a series of estimators for the meta-features that characterize overlap and geometry/topology of the decision boundary, and demonstrate how they can be used in algorithm development.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)