Semiconductor Memory Applications in Radiation Environment, Hardware Security and Machine Learning System

156804-Thumbnail Image.png
Description
Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications.

In

Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications.

In the radiation environment, e.g. aerospace, the memory devices face different energetic particles. The strike of these energetic particles can generate electron-hole pairs (directly or indirectly) as they pass through the semiconductor device, resulting in photo-induced current, and may change the memory state. First, the trend of radiation effects of the mainstream memory technologies with technology node scaling is reviewed. Then, single event effects of the oxide based resistive switching random memory (RRAM), one of eNVM technologies, is investigated from the circuit-level to the system level.

Physical Unclonable Function (PUF) has been widely investigated as a promising hardware security primitive, which employs the inherent randomness in a physical system (e.g. the intrinsic semiconductor manufacturing variability). In the dissertation, two RRAM-based PUF implementations are proposed for cryptographic key generation (weak PUF) and device authentication (strong PUF), respectively. The performance of the RRAM PUFs are evaluated with experiment and simulation. The impact of non-ideal circuit effects on the performance of the PUFs is also investigated and optimization strategies are proposed to solve the non-ideal effects. Besides, the security resistance against modeling and machine learning attacks is analyzed as well.

Deep neural networks (DNNs) have shown remarkable improvements in various intelligent applications such as image classification, speech classification and object localization and detection. Increasing efforts have been devoted to develop hardware accelerators. In this dissertation, two types of compute-in-memory (CIM) based hardware accelerator designs with SRAM and eNVM technologies are proposed for two binary neural networks, i.e. hybrid BNN (HBNN) and XNOR-BNN, respectively, which are explored for the hardware resource-limited platforms, e.g. edge devices.. These designs feature with high the throughput, scalability, low latency and high energy efficiency. Finally, we have successfully taped-out and validated the proposed designs with SRAM technology in TSMC 65 nm.

Overall, this dissertation paves the paths for memory technologies’ new applications towards the secure and energy-efficient artificial intelligence system.
Date Created
2018
Agent

Ultra-low Quiescent Current NMOS Low Dropout Regulator With Fast Transient response for Always-On Internet-of-Things Applications

156738-Thumbnail Image.png
Description
The increased adoption of Internet-of-Things (IoT) for various applications like smart home, industrial automation, connected vehicles, medical instrumentation, etc. has resulted in a large scale distributed network of sensors, accompanied by their power supply regulator modules, control and data transfer

The increased adoption of Internet-of-Things (IoT) for various applications like smart home, industrial automation, connected vehicles, medical instrumentation, etc. has resulted in a large scale distributed network of sensors, accompanied by their power supply regulator modules, control and data transfer circuitry. Depending on the application, the sensor location can be virtually anywhere and therefore they are typically powered by a localized battery. To ensure long battery-life without replacement, the power consumption of the sensor nodes, the supply regulator and, control and data transmission unit, needs to be very low. Reduction in power consumption in the sensor, control and data transmission is typically done by duty-cycled operation such that they are on periodically only for short bursts of time or turn on only based on a trigger event and are otherwise powered down. These approaches reduce their power consumption significantly and therefore the overall system power is dominated by the consumption in the always-on supply regulator.

Besides having low power consumption, supply regulators for such IoT systems also need to have fast transient response to load current changes during a duty-cycled operation. Supply regulation using low quiescent current low dropout (LDO) regulators helps in extending the battery life of such power aware always-on applications with very long standby time. To serve as a supply regulator for such applications, a 1.24 µA quiescent current NMOS low dropout (LDO) is presented in this dissertation. This LDO uses a hybrid bias current generator (HBCG) to boost its bias current and improve the transient response. A scalable bias-current error amplifier with an on-demand buffer drives the NMOS pass device. The error amplifier is powered with an integrated dynamic frequency charge pump to ensure low dropout voltage. A low-power relaxation oscillator (LPRO) generates the charge pump clocks. Switched-capacitor pole tracking (SCPT) compensation scheme is proposed to ensure stability up to maximum load current of 150 mA for a low-ESR output capacitor range of 1 - 47µF. Designed in a 0.25 µm CMOS process, the LDO has an output voltage range of 1V – 3V, a dropout voltage of 240 mV, and a core area of 0.11 mm2.
Date Created
2018
Agent

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

156610-Thumbnail Image.png
Description
Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.
Date Created
2018
Agent

High Performance Power Management Integrated Circuits for Portable Devices

156491-Thumbnail Image.png
Description
Portable devices often require multiple power management IC (PMIC) to power different sub-modules, Li-ion batteries are well suited for portable devices because of its small size, high energy density and long life cycle. Since Li-ion battery is the major power

Portable devices often require multiple power management IC (PMIC) to power different sub-modules, Li-ion batteries are well suited for portable devices because of its small size, high energy density and long life cycle. Since Li-ion battery is the major power source for portable device, fast and high-efficiency battery charging solution has become a major requirement in portable device application.

In the first part of dissertation, a high performance Li-ion switching battery charger is proposed. Cascaded two loop (CTL) control architecture is used for seamless CC-CV transition, time based technique is utilized to minimize controller area and power consumption. Time domain controller is implemented by using voltage controlled oscillator (VCO) and voltage controlled delay line (VCDL). Several efficiency improvement techniques such as segmented power-FET, quasi-zero voltage switching (QZVS) and switching frequency reduction are proposed. The proposed switching battery charger is able to provide maximum 2 A charging current and has an peak efficiency of 93.3%. By configure the charger as boost converter, the charger is able to provide maximum 1.5 A charging current while achieving 96.3% peak efficiency.

The second part of dissertation presents a digital low dropout regulator (DLDO) for system on a chip (SoC) in portable devices application. The proposed DLDO achieve fast transient settling time, lower undershoot/overshoot and higher PSR performance compared to state of the art. By having a good PSR performance, the proposed DLDO is able to power mixed signal load. To achieve a fast load transient response, a load transient detector (LTD) enables boost mode operation of the digital PI controller. The boost mode operation achieves sub microsecond settling time, and reduces the settling time by 50% to 250 ns, undershoot/overshoot by 35% to 250 mV and 17% to 125 mV without compromising the system stability.
Date Created
2018
Agent

Design of Resistive Synaptic Devices and Array Architectures for Neuromorphic Computing

156195-Thumbnail Image.png
Description
Over the past few decades, the silicon complementary-metal-oxide-semiconductor (CMOS) technology has been greatly scaled down to achieve higher performance, density and lower power consumption. As the device dimension is approaching its fundamental physical limit, there is an increasing demand for

Over the past few decades, the silicon complementary-metal-oxide-semiconductor (CMOS) technology has been greatly scaled down to achieve higher performance, density and lower power consumption. As the device dimension is approaching its fundamental physical limit, there is an increasing demand for exploration of emerging devices with distinct operating principles from conventional CMOS. In recent years, many efforts have been devoted in the research of next-generation emerging non-volatile memory (eNVM) technologies, such as resistive random access memory (RRAM) and phase change memory (PCM), to replace conventional digital memories (e.g. SRAM) for implementation of synapses in large-scale neuromorphic computing systems.

Essentially being compact and “analog”, these eNVM devices in a crossbar array can compute vector-matrix multiplication in parallel, significantly speeding up the machine/deep learning algorithms. However, non-ideal eNVM device and array properties may hamper the learning accuracy. To quantify their impact, the sparse coding algorithm was used as a starting point, where the strategies to remedy the accuracy loss were proposed, and the circuit-level design trade-offs were also analyzed. At architecture level, the parallel “pseudo-crossbar” array to prevent the write disturbance issue was presented. The peripheral circuits to support various parallel array architectures were also designed. One key component is the read circuit that employs the principle of integrate-and-fire neuron model to convert the analog column current to digital output. However, the read circuit is not area-efficient, which was proposed to be replaced with a compact two-terminal oscillation neuron device that exhibits metal-insulator-transition phenomenon.

To facilitate the design exploration, a circuit-level macro simulator “NeuroSim” was developed in C++ to estimate the area, latency, energy and leakage power of various neuromorphic architectures. NeuroSim provides a wide variety of design options at the circuit/device level. NeuroSim can be used alone or as a supporting module to provide circuit-level performance estimation in neural network algorithms. A 2-layer multilayer perceptron (MLP) simulator with integration of NeuroSim was demonstrated to evaluate both the learning accuracy and circuit-level performance metrics for the online learning and offline classification, as well as to study the impact of eNVM reliability issues such as data retention and write endurance on the learning performance.
Date Created
2018
Agent

Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

156189-Thumbnail Image.png
Description
Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such

Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such as latches and flip-flops that are used for sequencing computations

over time. The majority of the digital design techniques to reduce power, area, and

leakage over the past four decades have focused almost entirely on optimizing the

combinational logic. This work explores alternate architectures for the flip-flops for

improving the overall circuit performance, power and area. It consists of three main

sections.

First, is the design of a multi-input configurable flip-flop structure with embedded

logic. A conventional D-type flip-flop may be viewed as realizing an identity function,

in which the output is simply the value of the input sampled at the clock edge. In

contrast, the proposed multi-input flip-flop, named PNAND, can be configured to

realize one of a family of Boolean functions called threshold functions. In essence,

the PNAND is a circuit implementation of the well-known binary perceptron. Unlike

other reconfigurable circuits, a PNAND can be configured by simply changing the

assignment of signals to its inputs. Using a standard cell library of such gates, a technology

mapping algorithm can be applied to transform a given netlist into one with

an optimal mixture of conventional logic gates and threshold gates. This approach

was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier

in 65nm LP technology. Simulation and chip measurements show more than 30%

improvement in dynamic power and more than 20% reduction in core area.

The functional yield of the PNAND reduces with geometry and voltage scaling.

The second part of this research investigates the use of two mechanisms to improve

the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM

devices for low voltage operation.

The third part of this research focused on the design of flip-flops with non-volatile

storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated

with both conventional D-flipflop and the PNAND circuits to implement non-volatile

logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of

system locally when a power interruption occurs. However, manufacturing variations

in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading

to an overly pessimistic design and consequently, higher energy consumption. A

detailed analysis of the design trade-offs in the driver circuitry for performing backup

and restore, and a novel method to design the energy optimal driver for a given yield is

presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,

in which the backup time is determined on a per-chip basis, resulting in minimizing

the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,

the conventional approach would have to expend nearly 5X more energy than the

minimum required, whereas the proposed tunable approach expends only 26% more

energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are

designed with the same backup and restore circuitry in 65nm technology. The embedded

logic in NV-TLFF compensates performance overhead of NVL. This leads to the

possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-

accumulate (MAC) unit is designed to demonstrate the performance benefits of the

proposed architecture. Based on the results of HSPICE simulations, the MAC circuit

with the proposed NV-TLFF cells is shown to consume at least 20% less power and

area as compared to the circuit designed with conventional DFFs, without sacrificing

any performance.
Date Created
2018
Agent

Load Sharing Low Dropout Regulators Using Accurate Current Sensing

156074-Thumbnail Image.png
Description
The growing demand for high performance and power hungry portable electronic devices has resulted in alarmingly serious thermal concerns in recent times. The power management system of such devices has thus become increasingly more vital. An integral component of this

The growing demand for high performance and power hungry portable electronic devices has resulted in alarmingly serious thermal concerns in recent times. The power management system of such devices has thus become increasingly more vital. An integral component of this system is a Low-Dropout Regulator (LDO) which inherently generates a low-noise power supply. Such power supplies are crucial for noise sensitive analog blocks like analog-to-digital converters, phase locked loops, radio-frequency circuits, etc. At higher output power however, a single LDO suffers from increased heat dissipation leading to thermal issues.

This research presents a novel approach to equally and accurately share a large output load current across multiple parallel LDOs to spread the dissipated heat uniformly. The proposed techniques to achieve a high load sharing accuracy of 1% include an innovative fully-integrated accurate current sensing technique based on Dynamic Element Matching and an integrator based servo loop with a low offset feedback amplifier. A novel compensation scheme based on a switched capacitor resistor is referenced to address the high 2A output current specification per LDO across an output voltage range of 1V to 3V. The presented scheme also reduces stringent requirements on off-chip board traces and number of off-chip components thereby making it suitable for portable hand-held systems. The proposed approach can theoretically be extended to any number of parallel LDOs increasing the output current range extensively. The designed load sharing LDO features fast transient response for a low quiescent current consumption of 300µA with a power-supply rejection of 60.7dB at DC. The proposed load sharing technique is verified through extensive simulations for various sources and ranges of mismatch across process, voltage and temperature.
Date Created
2017
Agent

System Identification, Diagnosis, and Built-In Self-Test of High Switching Frequency DC-DC Converters

156043-Thumbnail Image.png
Description
Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High frequency switching converters have been gaining prominence in the DC-DC converter market due to smaller

Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High frequency switching converters have been gaining prominence in the DC-DC converter market due to smaller solution size (higher power density) and higher efficiency. As the filter components become smaller in value and size, they are unfortunately also subject to higher process variations and worse degradation profiles jeopardizing stable operation of the power supply. This dissertation presents techniques to track changes in the dynamic loop characteristics of the DC-DC converters without disturbing the normal mode of operation. A digital pseudo-noise (PN) based stimulus is used to excite the DC-DC system at various circuit nodes to calculate the corresponding closed-loop impulse response. The test signal energy is spread over a wide bandwidth and the signal analysis is achieved by correlating the PN input sequence with the disturbed output generated, thereby

accumulating the desired behavior over time. A mixed-signal cross-correlation circuit is used to derive on-chip impulse responses, with smaller memory and lower computational requirement in comparison to a digital correlator approach. Model reference based parametric and non-parametric techniques are discussed to analyze the impulse response results in both time and frequency domain. The proposed techniques can extract open-loop phase margin and closed-loop unity-gain frequency within 5.2% and 4.1% error, respectively, for the load current range of 30-200mA. Converter parameters such as natural frequency (ω_n ), quality factor (Q), and center frequency (ω_c ) can be estimated within 3.6%, 4.7%, and 3.8% error respectively, over load inductance of 4.7-10.3µH, and filter capacitance of 200-400nF. A 5-MHz switching frequency, 5-8.125V input voltage range, voltage-mode controlled DC-DC buck converter is designed for the proposed built-in self-test (BIST) analysis. The converter output voltage range is 3.3-5V and the supported maximum

load current is 450mA. The peak efficiency of the converter is 87.93%. The proposed converter is fabricated on a 0.6µm 6-layer-metal Silicon-On-Insulator (SOI) technology with a die area of 9mm^2 . The area impact due to the system identification blocks including related I/O structures is 3.8% and they consume 530µA quiescent current during operation.
Date Created
2017
Agent

Digital Controlled Multi-phase Buck Converter with Accurate Voltage and Current Control

155936-Thumbnail Image.png
Description
A 4-phase, quasi-current-mode hysteretic buck converter with digital frequency synchronization, online comparator offset-calibration and digital current sharing control is presented. The switching frequency of the hysteretic converter is digitally synchronized to the input clock reference with less than ±1.5% error

A 4-phase, quasi-current-mode hysteretic buck converter with digital frequency synchronization, online comparator offset-calibration and digital current sharing control is presented. The switching frequency of the hysteretic converter is digitally synchronized to the input clock reference with less than ±1.5% error in the switching frequency range of 3-9.5MHz. The online offset calibration cancels the input-referred offset of the hysteretic comparator and enables ±1.1% voltage regulation accuracy. Maximum current-sharing error of ±3.6% is achieved by a duty-cycle-calibrated delay line based PWM generator, without affecting the phase synchronization timing sequence. In light load conditions, individual converter phases can be disabled, and the final stage power converter output stage is segmented for high efficiency. The DC-DC converter achieves 93% peak efficiency for Vi = 2V and Vo = 1.6V.
Date Created
2017
Agent

Accelerated Aging in Devices and Circuits

155918-Thumbnail Image.png
Description
The aging mechanism in devices is prone to uncertainties due to dynamic stress conditions. In AMS circuits these can lead to momentary fluctuations in circuit voltage that may be missed by a compact model and hence cause unpredictable failure. Firstly,

The aging mechanism in devices is prone to uncertainties due to dynamic stress conditions. In AMS circuits these can lead to momentary fluctuations in circuit voltage that may be missed by a compact model and hence cause unpredictable failure. Firstly, multiple aging effects in the devices may have underlying correlations. The generation of new traps during TDDB may significantly accelerate BTI, since these traps are close to the dielectric-Si interface in scaled technology. Secondly, the prevalent reliability analysis lacks a direct validation of the lifetime of devices and circuits. The aging mechanism of BTI causes gradual degradation of the device leading to threshold voltage shift and increasing the failure rate. In the 28nm HKMG technology, contribution of BTI to NMOS degradation has become significant at high temperature as compared to Channel Hot Carrier (CHC). This requires revising the End of Lifetime (EOL) calculation based on contribution from induvial aging effects especially in feedback loops. Conventionally, aging in devices is extrapolated from a short-term measurement, but this practice results in unreliable prediction of EOL caused by variability in initial parameters and stress conditions. To mitigate the extrapolation issues and improve predictability, this work aims at providing a new approach to test the device to EOL in a fast and controllable manner. The contributions of this thesis include: (1) based on stochastic trapping/de-trapping mechanism, new compact BTI models are developed and verified with 14nm FinFET and 28nm HKMG data. Moreover, these models are implemented into circuit simulation, illustrating a significant increase in failure rate due to accelerated BTI, (2) developing a model to predict accelerated aging under special conditions like feedback loops and stacked inverters, (3) introducing a feedback loop based test methodology called Adaptive Accelerated Aging (AAA) that can generate accurate aging data till EOL, (4) presenting simulation and experimental data for the models and providing test setup for multiple stress conditions, including those for achieving EOL in 1 hour device as well as ring oscillator (RO) circuit for validation of the proposed methodology, and (5) scaling these models for finding a guard band for VLSI design circuits that can provide realistic aging impact.
Date Created
2017
Agent