Full metadata
Title
On-chip learning and inference acceleration of sparse representations
Description
The past decade has seen a tremendous surge in running machine learning (ML) functions on mobile devices, from mere novelty applications to now indispensable features for the next generation of devices.
While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to remain competitive.
Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful CPUs with GPUs to accelerate the computation of deep neural networks (DNNs), which are the most common structures to perform ML operations.
But traditional von Neumann architectures are not optimized for the high memory bandwidth and massively parallel computation demands required by DNNs.
Hence, propelling research into non-von Neumann architectures to support the demands of DNNs.
The re-imagining of computer architectures to perform efficient DNN computations requires focusing on the prohibitive demands presented by DNNs and alleviating them. The two central challenges for efficient computation are (1) large memory storage and movement due to weights of the DNN and (2) massively parallel multiplications to compute the DNN output.
Introducing sparsity into the DNNs, where certain percentage of either the weights or the outputs of the DNN are zero, greatly helps with both challenges. This along with algorithm-hardware co-design to compress the DNNs is demonstrated to provide efficient solutions to greatly reduce the power consumption of hardware that compute DNNs. Additionally, exploring emerging technologies such as non-volatile memories and 3-D stacking of silicon in conjunction with algorithm-hardware co-design architectures will pave the way for the next generation of mobile devices.
Towards the objectives stated above, our specific contributions include (a) an architecture based on resistive crosspoint array that can update all values stored and compute matrix vector multiplication in parallel within a single cycle, (b) a framework of training DNNs with a block-wise sparsity to drastically reduce memory storage and total number of computations required to compute the output of DNNs, (c) the exploration of hardware implementations of sparse DNNs and architectural guidelines to reduce power consumption for the implementations in monolithic 3D integrated circuits, and (d) a prototype chip in 65nm CMOS accelerator for long-short term memory networks trained with the proposed block-wise sparsity scheme.
While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to remain competitive.
Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful CPUs with GPUs to accelerate the computation of deep neural networks (DNNs), which are the most common structures to perform ML operations.
But traditional von Neumann architectures are not optimized for the high memory bandwidth and massively parallel computation demands required by DNNs.
Hence, propelling research into non-von Neumann architectures to support the demands of DNNs.
The re-imagining of computer architectures to perform efficient DNN computations requires focusing on the prohibitive demands presented by DNNs and alleviating them. The two central challenges for efficient computation are (1) large memory storage and movement due to weights of the DNN and (2) massively parallel multiplications to compute the DNN output.
Introducing sparsity into the DNNs, where certain percentage of either the weights or the outputs of the DNN are zero, greatly helps with both challenges. This along with algorithm-hardware co-design to compress the DNNs is demonstrated to provide efficient solutions to greatly reduce the power consumption of hardware that compute DNNs. Additionally, exploring emerging technologies such as non-volatile memories and 3-D stacking of silicon in conjunction with algorithm-hardware co-design architectures will pave the way for the next generation of mobile devices.
Towards the objectives stated above, our specific contributions include (a) an architecture based on resistive crosspoint array that can update all values stored and compute matrix vector multiplication in parallel within a single cycle, (b) a framework of training DNNs with a block-wise sparsity to drastically reduce memory storage and total number of computations required to compute the output of DNNs, (c) the exploration of hardware implementations of sparse DNNs and architectural guidelines to reduce power consumption for the implementations in monolithic 3D integrated circuits, and (d) a prototype chip in 65nm CMOS accelerator for long-short term memory networks trained with the proposed block-wise sparsity scheme.
Date Created
2019
Contributors
- Kadetotad, Deepak Vinayak (Author)
- Seo, Jae-Sun (Thesis advisor)
- Chakrabarti, Chaitali (Committee member)
- Vrudhula, Sarma (Committee member)
- Cao, Yu (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
xvi, 142 pages : color illustrations
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.54867
Statement of Responsibility
by Deepak Vinayak Kadetotad
Description Source
Viewed on August 24, 2020
Level of coding
full
Note
thesis
Partial requirement for: Ph.D., Arizona State University, 2019
bibliography
Includes bibliographical references
Field of study: Electrical engineering
System Created
- 2019-11-06 03:38:25
System Modified
- 2021-08-26 09:47:01
- 3 years 2 months ago
Additional Formats