Neuron-based Digital and Mixed-signal Circuit Design: From ASIC to SIMD Processors

187470-Thumbnail Image.png
Description
Among the many challenges facing circuit designers in deep sub-micron technologies, power, performance, area (PPA) and process variations are perhaps the most critical. Since existing strategies for reducing power and boosting the performance of the circuit designs have already matured

Among the many challenges facing circuit designers in deep sub-micron technologies, power, performance, area (PPA) and process variations are perhaps the most critical. Since existing strategies for reducing power and boosting the performance of the circuit designs have already matured to saturation, it is necessary to explore alternate unconventional strategies. This investigation focuses on using perceptrons to enhance PPA in digital circuits and starts by constructing the perceptron using a combination of complementary metal-oxide-semiconductor (CMOS) and flash technology. The use of flash enables the perceptron to have a variable delay and functionality, making them robust to process, voltage, and temperature variations. By replacing parts of an application-specific integrated circuit (ASIC) with these perceptrons, improvements of up to 30% in the area and 20% in power can be achieved without affecting performance. Furthermore, the ability to vary the delay of a perceptron enables circuit designers to fix setup and hold-time violations post-fabrication, while reprogramming the functionality enables the obfuscation of the circuits. The study extends to field-programmable gate arrays (FPGAs), showing that traditional FPGA architectures can also achieve improved PPA by replacing some Look-Up-Tables (LUTs) with perceptrons. Considering that replacing parts of traditional digital circuits provides significant improvements in PPA, a natural extension was to see whether circuits built dedicatedly using perceptrons as its compute unit would lead to improvements in energy efficiency. This was demonstrated by developing perceptron-based compute elements and constructing an architecture using these elements for Quantized Neural Network acceleration. The resulting circuit delivered up to 50 times more energy efficiency compared to a CMOS-based accelerator without using standard low-power techniques such as voltage scaling and approximate computing.
Date Created
2023
Agent

On the Numerical Computation of Second Order Control Barrier Functions

171516-Thumbnail Image.png
Description
In recent years, the development of Control Barrier Functions (CBF) has allowed safety guarantees to be placed on nonlinear control affine systems. While powerful as a mathematical tool, CBF implementations on systems with high relative degree constraints can become too

In recent years, the development of Control Barrier Functions (CBF) has allowed safety guarantees to be placed on nonlinear control affine systems. While powerful as a mathematical tool, CBF implementations on systems with high relative degree constraints can become too computationally intensive for real-time control. Such deployments typically rely on the analysis of a system's symbolic equations of motion, leading to large, platform-specific control programs that do not generalize well. To address this, a more generalized framework is needed. This thesis provides a formulation for second-order CBFs for rigid open kinematic chains. An algorithm for numerically computing the safe control input of a CBF is then introduced based on this formulation. It is shown that this algorithm can be used on a broad category of systems, with specific examples shown for convoy platooning, drone obstacle avoidance, and robotic arms with large degrees of freedom. These examples show up to three-times performance improvements in computation time as well as 2-3 orders of magnitude in the reduction in program size.
Date Created
2022
Agent

Eleatic: Secure Architecture Across the Edge-to-Cloud Continuum

171405-Thumbnail Image.png
Description
Many companies face pressure to deploy flexible compute infrastructures to manage their operations. However, the current developments in cloud and edge computing have created a data processing asymmetry challenge. On the edge, workloads frequently require low-latency responses, contend with connectivity

Many companies face pressure to deploy flexible compute infrastructures to manage their operations. However, the current developments in cloud and edge computing have created a data processing asymmetry challenge. On the edge, workloads frequently require low-latency responses, contend with connectivity and bandwidth instabilities, may require privacy guarantees, and may perform under limited or high-variance compute resources. In the cloud, workloads tolerate longer latency, expect highly available infrastructure, access high-performance compute resources, and have more power available, but may be further from where the processing results are needed. This compute asymmetry challenge requires a new computational paradigm. In this work, I advance a new computing architecture model, called the Continuum Computing Architecture (CCA), and validate this model with a candidate architecture. CCA is a unifying edge-fog-cloud computing model that provides the following capabilities: (i) a continuum of compute that spans from network-connected edge devices to the cloud – with very low power consumption to high-performance compute; (ii) same architecture with different micro-architectures along this compute continuum – a single RISC-V instruction set architecture with reconfigurable processing units; (iii) portability across all scales – the same program can be run across the continuum with different latencies and power utilizations; and (iv) secure shared memory features are fully-supported – physical memories along the continuum are abstracted to allow edge and cloud to share data in a transparent fashion. The validating architecture has three micro-architectures. The edge micro-architecture, Parmenides, targets accelerator-based edge processing system-on-chips (SoCs). Parmenides includes security features to protect the SoC in uncontrolled environments while adapting its power usage and processing to ambient events. The fog and cloud micro-architectures, Melissus and Zeno, must support application data distribution across the memory of many compute nodes to achieve the desired scale and performance. As a solution, I introduce the Eleatic Memory Model (EMM): a global shared memory architecture with hardware-supported global memory access permissions. All memory accesses are made with a Namespace-based capability scheme that supports improved scalability and memory security. The CCA model addresses several memory-centric security challenges including the misuse of resources, risk to application and data integrity, as well as concerns over authorization and confidentiality.
Date Created
2022
Agent

Reducing Delay, Power, and Area of a RISC-V Core Using Standard Cell Neurons

Description

In this thesis, I discuss the development of a novel physical design flow introducing standard-cell neurons for ASIC design. Standard-cell neurons are implemented on silicon as a circuit that realizes a threshold function. Each cell contains flash transistors, the threshold

In this thesis, I discuss the development of a novel physical design flow introducing standard-cell neurons for ASIC design. Standard-cell neurons are implemented on silicon as a circuit that realizes a threshold function. Each cell contains flash transistors, the threshold voltages of which correspond to the weights of the threshold function. Since the threshold voltages are programmed after fabrication, any sequential logic containing a standard-cell neuron is a logical black box upon delivery to the foundry. Additionally, previous research has shown significant reductions in delay, power, and area with the utilization of these flash transistor (FTL) cells. This paper aims to reinforce this prior research by demonstrating the first automatically synthesized, placed, and routed secure RISC-V core.

Date Created
2022-12
Agent

ARGOS: Adaptive Recognition and Gated Operation System for Real-time Vision Applications

168714-Thumbnail Image.png
Description
Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices. For instance, the intersection management of Connected Autonomous Vehicles (CAVs) requires running computationally intensive object recognition algorithms on low-power traffic cameras. This dissertation aims to study the effect of a dynamic hardware and software approach to address this issue. Characteristics of real-world applications can facilitate this dynamic adjustment and reduce the computation. Specifically, this dissertation starts with a dynamic hardware approach that adjusts itself based on the toughness of input and extracts deeper features if needed. Next, an adaptive learning mechanism has been studied that use extracted feature from previous inputs to improve system performance. Finally, a system (ARGOS) was proposed and evaluated that can be run on embedded systems while maintaining the desired accuracy. This system adopts shallow features at inference time, but it can switch to deep features if the system desires a higher accuracy. To improve the performance, ARGOS distills the temporal knowledge from deep features to the shallow system. Moreover, ARGOS reduces the computation furthermore by focusing on regions of interest. The response time and mean average precision are adopted for the performance evaluation to evaluate the proposed ARGOS system.
Date Created
2022
Agent

Improvements in Saliency Tracking for use in Brushless DC Motors

Description
Brushless DC (BLDC) motors are becoming increasingly common in various industrial and commercial applications such as micromobility and robotics due to their high torque density and efficiency. A BLDC Motor is a three-phase synchronous motor that is very similar to

Brushless DC (BLDC) motors are becoming increasingly common in various industrial and commercial applications such as micromobility and robotics due to their high torque density and efficiency. A BLDC Motor is a three-phase synchronous motor that is very similar to a non-salient Permanent Magnet Synchronous Motor (PMSM) with key differences lying in the non-ideal characteristics of the motor; the most prominent of these is BLDC motors have trapezoidal-shaped Back-Electromotive Force (BEMF). Despite their advantages, a present weakness of BLDC motors is the difficulty controlling these motors at standstill and low-speed conditions that require high torque. These operating conditions are common in the target applications and almost always necessitate the use of external sensors which introduce additional costs and points of failure. As such, sensorless based methods of position estimation would serve to improve system reliability, cost, and efficiency. High Frequency (HF) pulsating voltage injection in the direct axis is a popular method of sensorless control of salient-pole Interior-mount Permanent Magnet Synchronous Motors (IPMSM); however, existing methods are not sufficiently robust for use in BLDC and small Surface-mount Permanent Magnet Synchronous Motors (SPMSM) and are accompanied by other issues, such as acoustic noise. This thesis proposes novel improvements to the method of High Frequency Voltage Injection to allow for practical use in BLDC Motors and small SPMSM. Proposed improvements include 1) a hybrid frequency generator which allows for dynamic frequency scaling to improve tracking and eliminate acoustic noise, 2) robust error calculation that is stable despite the non-ideal characteristics of BLDC Motors, 3) gain engineering of Proportional-Integral (PI) type Phase-Locked-Loop (PLL) trackers that further lend stability, 4) observer decoupling mechanism to allow for seamless transition into state-of-the-art BEMF sensing methods at high speed, and 5) saliency boosting that allows for continuous tracking of saliency under high torque load. Experimental tests with a quadrature encoder and torque efficiency calculations on a dynamometer verify the practicality of the proposed algorithm and improvements.
Date Created
2021
Agent

Making a Real-Time Operating System for the Raspberry Pi 2B

164798-Thumbnail Image.png
Description
Real-Time Operating Systems are used in a variety of applications ranging from autonomous vehicles, flight controllers, and energy management systems to pacemakers, satellite tracking systems, amateur robotics and much more. It turns out that while general-purpose computers can perform tasks

Real-Time Operating Systems are used in a variety of applications ranging from autonomous vehicles, flight controllers, and energy management systems to pacemakers, satellite tracking systems, amateur robotics and much more. It turns out that while general-purpose computers can perform tasks quite quickly, the execution time for various processes varies noticeably between different executions. Execution time variation poses a big challenge for many computer-controlled systems that operate in the real-world such as robots, autonomous vehicles, drones, traffic signals, etc. The execution time variation matters in these systems since they must interact in the real world and perform actions at the proper times, and executing these tasks at other times can have varied effects ranging from a minor inconvenience to catastrophic failure. Many of these real-time systems are comprised of single board computers, such as a pacemaker. One single-board computer that is popular among hobbyists due to its form factor, cost, and performance is the Raspberry Pi, which uses an ARM-based processor. In order to provide a Real-Time Operating System for this single board computer this paper presents Jobbed, a single-core Real-Time Operating System which uses a fixed priority preemptive scheduler, targeted at the Raspberry Pi 2B. In this paper, we present the algorithmic structure behind this system and compare it to the Raspbian Operating System in an array of performance and behavioral tests targeted towards proper Real-Time Operating Systems.
Date Created
2022-05
Agent

Reduced Order Models and Approximations for Hardware Acceleration of Neural Networks

161997-Thumbnail Image.png
Description
Many real-world engineering problems require simulations to evaluate the design objectives and constraints. Often, due to the complexity of the system model, simulations can be prohibitive in terms of computation time. One approach to overcome this issue is to construct

Many real-world engineering problems require simulations to evaluate the design objectives and constraints. Often, due to the complexity of the system model, simulations can be prohibitive in terms of computation time. One approach to overcome this issue is to construct a surrogate model, which approximates the original model. The focus of this work is on the data-driven surrogate models, in which empirical approximations of the output are performed given the input parameters. Recently neural networks (NN) have re-emerged as a popular method for constructing data-driven surrogate models. Although, NNs have achieved excellent accuracy and are widely used, they pose their own challenges. This work addresses two common challenges, the need for: (1) hardware acceleration and (2) uncertainty quantification (UQ) in the presence of input variability. The high demand in the inference phase of deep NNs in cloud servers/edge devices calls for the design of low power custom hardware accelerators. The first part of this work describes the design of an energy-efficient long short-term memory (LSTM) accelerator. The overarching goal is to aggressively reduce the power consumption and area of the LSTM components using approximate computing, and then use architectural level techniques to boost the performance. The proposed design is synthesized and placed and routed as an application-specific integrated circuit (ASIC). The results demonstrate that this accelerator is 1.2X and 3.6X more energy-efficient and area-efficient than the baseline LSTM. In the second part of this work, a robust framework is developed based on an alternate data-driven surrogate model referred to as polynomial chaos expansion (PCE) for addressing UQ. In contrast to many existing approaches, no assumptions are made on the elements of the function space and UQ is a function of the expansion coefficients. Moreover, the sensitivity of the output with respect to any subset of the input variables can be computed analytically by post-processing the PCE coefficients. This provides a systematic and incremental method to pruning or changing the order of the model. This framework is evaluated on several real-world applications from different domains and is extended for classification tasks as well.
Date Created
2021
Agent

Hardware-friendly Deep Learning for Edge Computing

161275-Thumbnail Image.png
Description
The Internet-of-Things (IoT) boosts the vast amount of streaming data. However, even considering the growth of the cloud computing infrastructure, IoT devices will generate two orders of magnitude more than the capacity that centralized data center servers can process or

The Internet-of-Things (IoT) boosts the vast amount of streaming data. However, even considering the growth of the cloud computing infrastructure, IoT devices will generate two orders of magnitude more than the capacity that centralized data center servers can process or store. This trend inevitability calls for the need for offloading IoT data processing to a decentralized edge computing infrastructure. On the other hand, deep-learning-based applications gain great progress by taking advantage of heavy centralized computing resources for training large models to fit increasingly complicated tasks. Even though large-scale deep learning models perform well in terms of accuracy, their high computational complexity makes it impossible to offload them onto edge devices for real-time inference and timely response. To enable timely IoT services on edge devices, this dissertation addresses the challenge from two perspectives. On the hardware side, a new field-programmable gate array (FPGA)-based framework for binary neural network and an application-specific integrated circuit (ASIC) accelerator for natural scene text interpretation are proposed, with the awareness of the computing resources and power constraint on edge. On the algorithm side, this work presents both the methodology of building more compact models and finding better computation-accuracy trade-off for existing models.
Date Created
2021
Agent

Pool Level Monitor and Autofill System: A Smart Home Device

148340-Thumbnail Image.png
Description

As smart home devices become more common in households across the globe, it is<br/>surprising that companies who specialize in IoT devices have not exploited the world of swimming<br/>pools. As a pool owner and avid IoT user, it has become increasingly

As smart home devices become more common in households across the globe, it is<br/>surprising that companies who specialize in IoT devices have not exploited the world of swimming<br/>pools. As a pool owner and avid IoT user, it has become increasingly obvious to me that such<br/>devices are necessary. Thus, I have developed an embedded system – connected to a web-based<br/>reporting system – that accurately reports common chemical levels of a swimming pool. In<br/>addition, this system includes an autofill function with information about the amount of water<br/>dispensed. This system gives pool owners access to an all-in-one device that can be used on any<br/>pool, new or old. Future implementations include a personalized application to display the pool<br/>levels and user-defined suggestions when certain levels become too high or low.

Date Created
2021-05
Agent