Full metadata
Title
Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient
Description
Deep neural networks (DNNs) have had tremendous success in a variety of
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization.
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization.
Date Created
2019
Contributors
- Kadambi, Pradyumna (Author)
- Berisha, Visar (Thesis advisor)
- Dasarathy, Gautam (Committee member)
- Seo, Jae-Sun (Committee member)
- Cao, Yu (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
83 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.55679
Level of coding
minimal
Note
Masters Thesis Computer Engineering 2019
System Created
- 2020-01-14 09:20:18
System Modified
- 2021-08-26 09:47:01
- 3 years 2 months ago
Additional Formats