Computation Offloading of Machine Learning Workloads at the Edge

193571-Thumbnail Image.png
Description
The number of IoT (Internet of Things) devices that will be deployed by the year 2030 is expected to exceed 125 billion. Such large volumes will be possible only if their design and maintenance costs are kept to an absolute

The number of IoT (Internet of Things) devices that will be deployed by the year 2030 is expected to exceed 125 billion. Such large volumes will be possible only if their design and maintenance costs are kept to an absolute minimum. For these reasons, IoT devices will have to be designed using mostly commercial-off-the-shelf (COTS) processors, whose performance and energy efficiency would be much less than an equivalent custom device. Compounding this situation is the increasing demand to utilize these devices for performing real-time data analysis and decision making, which require algorithms with substantial computation complexity. Offloading computations to cloud servers is now incurring increasing delays and faces privacy and security concerns. Recognizing these issues, a new computing paradigm, called edge computing, is emerging. It involves a user-end device which is the first recipient of the data and a collection of local or physically nearby systems that have significantly more computational and storage capacity (referred to as cloudlets). The mode of operation involves the user-end device sharing the computation with the cloudlet in a way that minimizes the energy consumption of the user-end device and/or minimizes the latency of the computation. This dissertation provides an optimization framework to partition the machine learning workloads between the user-end device and the cloudlet with the goal of energy and performance improvement. First, a method is presented to partition the layers of a deep neural network (DNN) between the user-end device and the cloudlet to minimize the energy consumption of the user-end device considering stochastic communication delays. Second, an energy minimization method is discussed that partitions the network of DNNs between devices considering the parallel execution of DNNs. Third, a delay-constrained energy minimization technique is presented to partition the network of DNNs between the devices.
Date Created
2024
Agent

Elliptic Fourier Features for Robustness to Rotations and Translations in Neural Networks

190885-Thumbnail Image.png
Description
In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused by the aforementioned transformations. I use this technique to transform input images into one of two different invariant representations, a Fourier series representation and a corrected raster image representation, prior to passing them to a neural network for classification. The architectures used include convolutional neutral networks (CNNs), multi-layer perceptrons (MLPs), and graph neural networks (GNNs). I compare the performance of this method to using data augmentation during training, the standard approach for addressing distribution shift, to see which strategy yields the best performance when evaluated against a test set with rotations and translations applied. I include experiments where the augmentations applied during training both do and do not accurately reflect the transformations encountered at test time. Additionally, I investigate the robustness of both approaches to high-frequency noise. In each experiment, I also compare training efficiency across models. I conduct experiments on three data sets, the MNIST handwritten digit dataset, a custom dataset (QD-3) consisting of three classes of geometric figures from the Quick, Draw! hand-drawn sketch dataset, and another custom dataset (QD-345) featuring sketches from all 345 classes found in Quick, Draw!. On the smaller problem space of MNIST and QD-3, the networks utilizing the Fourier-based technique to attenuate distribution shift perform competitively with the standard data augmentation strategy. On the more complex problem space of QD-345, the networks using the Fourier technique do not achieve the same test performance as correctly-applied data augmentation. However, they still outperform instances where train-time augmentations mis-predict test-time transformations, and outperform a naive baseline model where no strategy is used to attenuate distribution shift. Overall, this work provides evidence that strategies which attempt to directly mitigate distribution shift, rather than simply increasing the diversity of the training data, can be successful when certain conditions hold.
Date Created
2023
Agent

Vision-guided Policy Learning for Complex Tasks

161863-Thumbnail Image.png
Description
The field of computer vision has achieved tremendous progress over recent years with innovations in deep learning and neural networks. The advances have unprecedentedly enabled an intelligent agent to understand the world from its visual observations, such as recognizing an

The field of computer vision has achieved tremendous progress over recent years with innovations in deep learning and neural networks. The advances have unprecedentedly enabled an intelligent agent to understand the world from its visual observations, such as recognizing an object, detecting the object's position, and estimating the distance to the object. It then comes to a question of how such visual understanding can be used to support the agent's decisions over its actions to perform a task. This dissertation aims to study this question in which several methods are presented to address the challenges in learning a desirable action policy from the agent's visual inputs for the agent to perform a task well. Specifically, this dissertation starts with learning an action policy from high dimensional visual observations by improving the sample efficiency. The improved sample efficiency is achieved through a denser reward function defined upon the visual understanding of the task, and an efficient exploration strategy equipped with a hierarchical policy. It further studies the generalizable action policy learning problem. The generalizability is achieved for both a fully observable task with local environment dynamic captured by visual representations, and a partially observable task with global environment dynamic captured by a novel graph representation. Finally, this dissertation explores learning from human-provided priors, such as natural language instructions and demonstration videos for better generalization ability.
Date Created
2021
Agent

Exploring Deep Learning for Video Understanding

158824-Thumbnail Image.png
Description
Video analysis and understanding have obtained more and more attention in recent years. The research community also has devoted considerable effort and made progress in many related visual tasks, like video action/event recognition, thumbnail frame or video index retrieval, and

Video analysis and understanding have obtained more and more attention in recent years. The research community also has devoted considerable effort and made progress in many related visual tasks, like video action/event recognition, thumbnail frame or video index retrieval, and zero-shot learning. The way to find good representative features of videos is an important objective for these visual tasks.

Thanks to the success of deep neural networks in recent vision tasks, it is natural to take the deep learning methods into consideration for better extraction of a global representation of the images and videos. In general, Convolutional Neural Network (CNN) is utilized for obtaining the spatial information, and Recurrent Neural Network (RNN) is leveraged for capturing the temporal information.

This dissertation provides a perspective of the challenging problems in different kinds of videos which may require different solutions. Therefore, several novel deep learning-based approaches of obtaining representative features are outlined for different visual tasks like zero-shot learning, video retrieval, and video event recognition in this dissertation. To better understand and obtained the video spatial and temporal information, Convolutional Neural Network and Recurrent Neural Network are jointly utilized in most approaches. And different experiments are conducted to present the importance and effectiveness of good representative features for obtaining a better knowledge of video clips in the computer vision field. This dissertation also concludes a discussion with possible future works of obtaining better representative features of more challenging video clips.
Date Created
2020
Agent