Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into PRT implementation has also shown that parents can be

trained to be effective interventionists for their children. The current difficulty in PRT

training is how to disseminate training to parents who need it, and how to support and

motivate practitioners after training.

Evaluation of the parents’ fidelity to implementation is often undertaken using video

probes that depict the dyadic interaction occurring between the parent and the child during

PRT sessions. These videos are time consuming for clinicians to process, and often result

in only minimal feedback for the parents. Current trends in technology could be utilized to

alleviate the manual cost of extracting data from the videos, affording greater

opportunities for providing clinician created feedback as well as automated assessments.

The naturalistic context of the video probes along with the dependence on ubiquitous

recording devices creates a difficult scenario for classification tasks. The domain of the

PRT video probes can be expected to have high levels of both aleatory and epistemic

uncertainty. Addressing these challenges requires examination of the multimodal data

along with implementation and evaluation of classification algorithms. This is explored

through the use of a new dataset of PRT videos.

The relationship between the parent and the clinician is important. The clinician can

provide support and help build self-efficacy in addition to providing knowledge and

modeling of treatment procedures. Facilitating this relationship along with automated

feedback not only provides the opportunity to present expert feedback to the parent, but

also allows the clinician to aid in personalizing the classification models. By utilizing a

human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the

classification models by providing additional labeled samples. This will allow the system

to improve classification and provides a person-centered approach to extracting

multimodal data from PRT video probes.
Deep domain fusion for adaptive image classification

Endowing machines with the ability to understand digital images is a critical task for a host of high-impact applications, including pathology detection in radiographic imaging, autonomous vehicles, and assistive technology for the visually impaired. Computer vision systems rely on large corpora of annotated data in order to train task-specific visual recognition models. Despite significant advances made over the past decade, the fact remains collecting and annotating the data needed to successfully train a model is a prohibitively expensive endeavor. Moreover, these models are prone to rapid performance degradation when applied to data sampled from a different domain. Recent works in the development of deep adaptation networks seek to overcome these challenges by facilitating transfer learning between source and target domains. In parallel, the unification of dominant semi-supervised learning techniques has illustrated unprecedented potential for utilizing unlabeled data to train classification models in defiance of discouragingly meager sets of annotated data.

In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks.
Representation, Exploration, and Recommendation of Music Playlists

Playlists have become a significant part of the music listening experience today because of the digital cloud-based services such as Spotify, Pandora, Apple Music. Owing to the meteoric rise in usage of playlists, recommending playlists is crucial to music services today. Although there has been a lot of work done in playlist prediction, the area of playlist representation hasn't received that level of attention. Over the last few years, sequence-to-sequence models, especially in the field of natural language processing have shown the effectiveness of learned embeddings in capturing the semantic characteristics of sequences. Similar concepts can be applied to music to learn fixed length representations for playlists and the learned representations can then be used for downstream tasks such as playlist comparison and recommendation.

In this thesis, the problem of learning a fixed-length representation is formulated in an unsupervised manner, using Neural Machine Translation (NMT), where playlists are interpreted as sentences and songs as words. This approach is compared with other encoding architectures and evaluated using the suite of tasks commonly used for evaluating sentence embeddings, along with a few additional tasks pertaining to music. The aim of the evaluation is to study the traits captured by the playlist embeddings such that these can be leveraged for music recommendation purposes. This work lays down the foundation for analyzing music playlists and learning the patterns that exist in the playlists in an end-to-end manner. This thesis finally concludes with a discussion on the future direction for this research and its potential impact in the domain of Music Information Retrieval.
Learning Transferable Data Representations Using Deep Generative Models

Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the feature space to learn transferable and disentangled rep-

resentations. Transferable feature representations help in training machine learning

models that are robust across different distributions of data. For example, with the

application of transferable features in domain adaptation, models trained on a source

distribution can be applied to a data from a target distribution even though the dis-

tributions may be different. In style transfer and image-to-image translation, disen-

tangled representations allow for the separation of style and content when translating


This thesis examines learning transferable data representations in novel deep gen-

erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar-

ial methods and cross-domain weight sharing in a neural network to extract trans-

ferable representations. These transferable interpretations can then be decoded into

the original image or a similar image in another domain. The Explicit Disentangling

Network (EDN) utilizes generative methods to disentangle images into their core at-

tributes and then segments sets of related attributes. The EDN can separate these

attributes by controlling the ow of information using a novel combination of losses

and network architecture. This separation of attributes allows precise modi_cations

to speci_c components of the data representation, boosting the performance of ma-

chine learning tasks. The effectiveness of these models is evaluated across domain

adaptation, style transfer, and image-to-image translation tasks.
Addressing Problems Facing Unmanned Aerial System Scheduling Systems in Urban Environments

Research literature was reviewed to find recommended tools and technologies for operating Unmanned Aerial Systems (UAS) fleets in an urban environment. However, restrictive legislation prohibits fully autonomous flight without an operator. Existing literature covers considerations for operating UAS fleets in a controlled environment, with an emphasis on the effect different networking approaches have on the topology of the UAS network. The primary network topology used to implement UAS communications is 802.11 protocols, which can transmit telemetry and a video stream using off the shelf hardware. Other implementations use low-frequency radios for long distance communication, or higher latency 4G LTE modems to access existing network infrastructure. However, a gap remains testing different network topologies outside of a controlled environment.

With the correct permits in place, further research can explore how different UAS network topologies behave in an urban environment when implemented with off the shelf UAS hardware. In addition to testing different network topologies, this thesis covers the implementation of building a secure, scalable system using modern cloud computation tools and services capable of supporting a variable number of UAS. The system also supports the end-to-end simulation of the system considering factors such as battery life and realistic UAS kinematics. The implementation of the system leads to new findings needed to deploy UAS fleets in urban environments.
A Novel Battery Management & Charging Solution for Autonomous UAV Systems

Currently, one of the biggest limiting factors for long-term deployment of autonomous systems is the power constraints of a platform. In particular, for aerial robots such as unmanned aerial vehicles (UAVs), the energy resource is the main driver of mission planning and operation definitions, as everything revolved around flight time. The focus of this work is to develop a new method of energy storage and charging for autonomous UAV systems, for use during long-term deployments in a constrained environment. We developed a charging solution that allows pre-equipped UAV system to land on top of designated charging pads and rapidly replenish their battery reserves, using a contact charging point. This system is designed to work with all types of rechargeable batteries, focusing on Lithium Polymer (LiPo) packs, that incorporate a battery management system for increased reliability. The project also explores optimization methods for fleets of UAV systems, to increase charging efficiency and extend battery lifespans. Each component of this project was first designed and tested in computer simulation. Following positive feedback and results, prototypes for each part of this system were developed and rigorously tested. Results show that the contact charging method is able to charge LiPo batteries at a 1-C rate, which is the industry standard rate, maintaining the same safety and efficiency standards as modern day direct connection chargers. Control software for these base stations was also created, to be integrated with a fleet management system, and optimizes UAV charge levels and distribution to extend LiPo battery lifetimes while still meeting expected mission demand. Each component of this project (hardware/software) was designed for manufacturing and implementation using industry standard tools, making it ideal for large-scale implementations. This system has been successfully tested with a fleet of UAV systems at Arizona State University, and is currently being integrated into an Arizona smart city environment for deployment.
Convolutional Neural Networks for Facial Expression Recognition

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.
Computing Platform for Context Aware Smart Objects for Stroke Rehabilitation

In order to regain functional use of affected limbs, stroke patients must undergo intense, repetitive, and sustained exercises. For this reason, it is a common occurrence for the recovery of stroke patients to suffer as a result of mental fatigue and boredom. For this reason, serious games aimed at reproducing the movements patients practice during rehabilitation sessions, present a promising solution to mitigating patient psychological exhaustion. This paper presents a system developed at the Center for Cognitive Ubiquitous Computing (CubiC) at Arizona State University which provides a platform for the development of serious games for stroke rehabilitation. The system consists of a network of nodes called Smart Cubes based on the Raspberry Pi (model B) computer which have an array of sensors and actuators as well as communication modules that are used in-game. The Smart Cubes are modular, taking advantage of the Raspberry Pi's General Purpose Input/Output header, and can be augmented with additional sensors or actuators in response to the desires of game developers and stroke rehabilitation therapists. Smart Cubes present advantages over traditional exercises such as having the capacity to provide many different forms of feedback and allowing for dynamically adapting games. Smart Cubes also present advantages over modern serious gaming platforms in the form of their modularity, flexibility resulting from their wireless network topology, and their independence of a monitor. Our contribution is a prototype of a Smart Cube network, a programmable computing platform, and a software framework specifically designed for the creation of serious games for stroke rehabilitation.
Exploring the Design of Vibrotactile Cues for Visio-Haptic Sensory Substitution

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of a chair to provide vibrotactile stimulation in the context of a dyadic (one-on-one) interaction across a table. This work explores the design of spatiotemporal vibration patterns that can be used to convey the basic building blocks of facial movements according to the Facial Action Unit Coding System. A behavioral study was conducted to explore the factors that influence the naturalness of conveying affect using vibrotactile cues.
Utilizing Neural Networks to Predict Freezing of Gait in Parkinson's Patients

The artificial neural network is a form of machine learning that is highly effective at recognizing patterns in large, noise-filled datasets. Possessing these attributes uniquely qualifies the neural network as a mathematical basis for adaptability in personal biomedical devices. The purpose of this study was to determine the viability of neural networks in predicting Freezing of Gait (FoG), a symptom of Parkinson's disease in which the patient's legs are suddenly rendered unable to move. More specifically, a class of neural networks known as layered recurrent networks (LRNs) was applied to an open- source FoG experimental dataset donated to the Machine Learning Repository of the University of California at Irvine. The independent variables in this experiment \u2014 the subject being tested, neural network architecture, and sampling of the majority classes \u2014 were each varied and compared against the performance of the neural network in predicting future FoG events. It was determined that single-layered recurrent networks are a viable method of predicting FoG events given the volume of the training data available, though results varied significantly between different patients. For the three patients tested, shank acceleration data was used to train networks with peak precision/recall values of 41.88%/47.12%, 89.05%/29.60%, and 57.19%/27.39% respectively. These values were obtained for networks optimized using detection theory rather than optimized for desired values of precision and recall. Furthermore, due to the nature of the experiments performed in this study, these values are representative of the lower-bound performance of layered recurrent networks trained to detect gait freezing. As such, these values may be improved through a variety of measures.
