Bliss, Daniel

Accelerator Design And Hardware Implementation For Distributed Coherent Mesh Beamformer

Description

This dissertation summarizes achievements and ongoing designs of Field-Programmable Gate Array (FPGA) accelerators for Distributed Coherent Mesh Beamforming (DCMB). The goal of the distributed coherent network beamforming program is to create a network of distributed beams. The radios that make up this network must be small in size, weight, power, and cost while being able to overcome long transmission distances and interference. Due to the limitations, a solid communication link can be developed, using high speed to significantly increase signal strength and reduce interference. Two slots were developed to calculate the beamformer for the target platforms. One route is purely FPGA-based. Another option is a hybrid approach that uses the FPGA to do some of the initial calculations and the rest on the Central Processing Unit (CPU). Overall latency was significantly reduced when performing FPGA calculations. DCMB has become a technology for improving wireless communication systems, providing adaptability and efficiency in dynamic environments. This dissertation presents an in-depth study of DCMB with specific innovations in accelerator design and overall controller architecture. I investigate the design and implementation of dedicated accelerators adapted for DCMB tasks, including Finite Impulse Response (FIR) filtering, matrix multiplication, QR decomposition, and compensation on FPGA platforms. These accelerators are specially optimized for real-time processing and better performance on DCMB systems. Compared to soft-core processors, my research shows that hardware accelerators provide significantly faster processing speeds, enabling fast execution and reduced latency in communication systems. In addition, I discuss the design and integration of a general controller that optimizes the operation of accelerators and coordinates the beamforming process between distributed nodes. Through experiments with analytical and simulation tools, my study highlights the superiority of hardware accelerators over soft-core processors for high-speed calculation tasks in DCMB systems.

Date Created

2024

Agent

Author (aut): Li, Yang
Thesis advisor (ths): Bliss, Daniel
Committee member: Chakrabart, Chaitali
Committee member: Alkhateeb, Ahmed
Committee member: Papandreou, Antonia
Publisher (pbl): Arizona State University

Reconfigurable Intelligent Surfaces for Next-Generation Communication and Sensing Systems

Description

With the rapid development of reflect-arrays and software-defined meta-surfaces, reconfigurable intelligent surfaces (RISs) have been envisioned as promising technologies for next-generation wireless communication and sensing systems. These surfaces comprise massive numbers of nearly-passive elements that interact with the incident signals in a smart way to improve the performance of such systems. In RIS-aided communication systems, designing this smart interaction, however, requires acquiring large-dimensional channel knowledge between the RIS and the transmitter/receiver. Acquiring this knowledge is one of the most crucial challenges in RISs as it is associated with large computational and hardware complexity. For RIS-aided sensing systems, it is interesting to first investigate scene depth perception based on millimeter wave (mmWave) multiple-input multiple-output (MIMO) sensing. While mmWave MIMO sensing systems address some critical limitations suffered by optical sensors, realizing these systems possess several key challenges: communication-constrained sensing framework design, beam codebook design, and scene depth estimation challenges. Given the high spatial resolution provided by the RISs, RIS-aided mmWave sensing systems have the potential to improve the scene depth perception, while imposing some key challenges too. In this dissertation, for RIS-aided communication systems, efficient RIS interaction design solutions are proposed by leveraging tools from compressive sensing and deep learning. The achievable rates of these solutions approach the upper bound, which assumes perfect channel knowledge, with negligible training overhead. For RIS-aided sensing systems, a mmWave MIMO based sensing framework is first developed for building accurate depth maps under the constraints imposed by the communication transceivers. Then, a scene depth estimation framework based on RIS-aided sensing is developed for building high-resolution accurate depth maps. Numerical simulations illustrate the promising performance of the proposed solutions, highlighting their potential for next-generation communication and sensing systems.

Date Created

2023

Agent

Author (aut): Taha, Abdelrahman
Thesis advisor (ths): Alkhateeb, Ahmed
Committee member: Bliss, Daniel
Committee member: Tepedelenlioğlu, Cihan
Committee member: Michelusi, Nicolò
Publisher (pbl): Arizona State University

Improving the Programmability of a Systolic Array Processor

Description

This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much on programmability and reconfigurability. The structure of a typical DAP code for each Processing Element (PE) is very different from any other programming language format. As a result, writing code for DAP requires the programmer to acquire processor-specific knowledge including configuration rules, cycle accurate execution state for memory and datapath components within each PE, etc. Each code must be carefully handcrafted to meet the strict timing and resource constraints, leading to very long programming times and low productivity. In this thesis, a code generation and optimization tool is introduced to improve the programmability of DAP and make code development easier. The tool consists of a configuration code generator, optimizer, and a scheduler. An Instruction Set Architecture (ISA) has been designed specifically for DAP. The programmer writes the assembly code for each PE using the DAP ISA. The assembly code is then translated into a low-level configuration code. This configuration code undergoes several optimizations passes. Level 1 (L1) optimization handles instruction redundancy and performs loop optimizations through code movement. The Level 2 (L2) optimization performs instruction-level parallelism. Use of L1 and L2 optimization passes result in a code that has fewer instructions and requires fewer cycles. In addition, a scheduling tool has been introduced which performs final timing adjustments on the code to match the input data rate.

Date Created

2022

Agent

Author (aut): Vipperla, Anish
Thesis advisor (ths): Chakrabarti, Chaitali
Committee member: Bliss, Daniel
Committee member: Akoglu, Ali
Publisher (pbl): Arizona State University

Characterization and Testing of the Weighted-Overlap-and-Add High-Speed Polyphase Filterbank

Description

The Discrete Fourier Transform (DFT) is a mathematical operation utilized in various signal processing applications including Astronomy and digital communications (satellite, cellphone, radar, etc.) to separate signals at different frequencies. Performing DFT on a signal by itself suffers from inter-channel leakage. For an ultrasensitive application like radio astronomy, it is important to minimize frequency sidelobes. To achieve this, the Polyphase Filterbank (PFB) technique is used which modifies the bin-response of the DFT to a rectangular function and suppresses out-of-band crosstalk. This helps achieve the Signal-to-Noise Ratio (SNR) required for astronomy measurements. In practice, 2N DFT can be efficiently implemented on Digital Signal Processing (DSP) hardware by the popular Fast Fourier Transform (FFT) algorithm. Hence, 2N tap-filters are commonly used in the Filterbank stage before the FFT. At present, Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs) from different vendors (e.g. Xilinx, Altera, Microsemi, etc.) are available which offer high performance. Xilinx Radio-Frequency System-on-Chip (RFSoC) is the latest kind of such a platform offering Radio-frequency (RF) signal capture / generate capability on the same chip. This thesis describes the characterization of the Analog-to-Digital Converter (ADC) available on the Xilinx ZCU111 RFSoC platform, detailed design steps of a Critically-Sampled PFB, and the testing and debugging of a Weighted OverLap and Add (WOLA) PFB to examine the feasibility of implementation on custom ASICs for future space missions. The design and testing of an analog Printed Circuit Board (PCB) circuit for biasing cryogenic detectors and readout components are also presented here.

Date Created

2022

Agent

Author (aut): Biswas, Raj
Thesis advisor (ths): Mauskopf, Philip
Thesis advisor (ths): Bliss, Daniel
Committee member: Hooks, Tracee J
Committee member: Groppi, Christopher
Committee member: Zeinolabedinzadeh, Saeed
Publisher (pbl): Arizona State University

Reconfigurable RF Transmitters for C-Band and X-Band: Design, Development and Testing

Description

This thesis covers the design, development and testing of two high-power radio frequency transmitters that operate in C-band and X-band (System-C/X). The operational bands of System-C/X are 3-6 GHz and 8-11 GHz, respectively. Each system is designed to produce a peak effective isotropic radiated power of at least 50 dBW. The transmitters use parabolic dish antennas with dual-linear polarization feeds that can be steered over a wide range of azimuths and elevations with a precision of a fraction of a degree. System-C/X's transmit waveforms are generated using software-defined radios. The software-defined radio software is lightweight and reconfigurable. New waveforms can be loaded into the system during operation and saved to an onboard database. The waveform agility of the two systems lends them to potential uses in a wide range of broadcasting applications, including radar and communications. The effective isotropic radiated power and beam patterns for System-C/X were measured during two field test events in July 2021 and January 2022. The performance of both systems was found to be within acceptable limits of their design specifications.

Date Created

2022

Agent

Author (aut): Gordon, Samuel
Thesis advisor (ths): Bliss, Daniel
Thesis advisor (ths): Mauskopf, Philip
Committee member: Papandreou-Suppappola, Antonia
Publisher (pbl): Arizona State University

Measurement, Detection, and Parameter Estimation of Single Photon Correlations

Description

The continuous time-tagging of photon arrival times for high count rate sources isnecessary for applications such as optical communications, quantum key encryption, and astronomical measurements. Detection of Hanbury-Brown and Twiss (HBT) single photon correlations from thermal sources, such as stars, requires a combination of high dynamic range, long integration times, and low systematics in the photon detection and time tagging system. The continuous nature of the measurements and the need for highly accurate timing resolution requires a customized time-to-digital converter (TDC). A custom built, two-channel, field programmable gate array (FPGA)-based TDC capable of continuously time tagging single photons with sub clock cycle timing resolution was characterized. Auto-correlation and cross-correlation measurements were used to constrain spurious systematic effects in the pulse count data as a function of system variables. These variables included, but were not limited to, incident photon count rate, incoming signal attenuation, and measurements of fixed signals. Additionally, a generalized likelihood ratio test using maximum likelihood estimators (MLEs) was derived as a means to detect and estimate correlated photon signal parameters. The derived GLRT was capable of detecting correlated photon signals in a laboratory setting with a high degree of statistical confidence. A proof is presented in which the MLE for the amplitude of the correlated photon signal is shown to be the minimum variance unbiased estimator (MVUE). The fully characterized TDC was used in preliminary measurements of astronomical sources using ground based telescopes. Finally, preliminary theoretical groundwork is established for the deep space optical communications system of the proposed Breakthrough Starshot project, in which low-mass craft will travel to the Alpha Centauri system to collect scientific data from Proxima B. This theoretical groundwork utilizes recent and upcoming space based optical communication systems as starting points for the Starshot communication system.

Date Created

2022

Agent

Author (aut): Hodges, Todd Michael William
Thesis advisor (ths): Mauskopf, Philip
Thesis advisor (ths): Trichopoulos, George
Committee member: Papandreou-Suppappola, Antonia
Committee member: Bliss, Daniel
Publisher (pbl): Arizona State University

Precision Navigation using Two-Way Ranging: Bounds and Performance

Description

Localization tasks using two-way ranging (TWR) are making headway in modern daynavigation applications as an alternative to legacy global navigation satellite systems (GNSS) such as GPS. There is not currently literature that provides a closed-form expression for estimation performance bounds on position and attitude when a TWR system is employed. A Cramer-Rao Lower Bounds (CRLB) is derived for position and orientation estimation using both 2-D and 3-D geometries. A literature review is performed to give background and detail on the tools needed for a thorough analysis of this problem. Popular Least Squares techniques and solutions to Wahba’s problem are compared to the derived bounds as proof of correctness using Monte Carlo simulations. A brief exploration on estimation performance using an Extended Kalman Filter for non-stationary users is also looked at as an introduction to future extensions to this work. The literature Applications like the CHP2 system are discussed as well to show how secure, inexpensive and robust implementation of TWR is highly feasible. i

Date Created

2022

Agent

Author (aut): Welker, Samuel
Thesis advisor (ths): Bliss, Daniel
Committee member: Herschfelt, Andrew
Committee member: Tsakalis, Konstantinos
Publisher (pbl): Arizona State University

Audio Waveform Sample SVD Compression and Impact on Performance

Description

Lossy compression is a form of compression that slightly degrades a signal in ways that are ideally not detectable to the human ear. This is opposite to lossless compression, in which the sample is not degraded at all. While lossless compression may seem like the best option, lossy compression, which is used in most audio and video, reduces transmission time and results in much smaller file sizes. However, this compression can affect quality if it goes too far. The more compression there is on a waveform, the more degradation there is, and once a file is lossy compressed, this process is not reversible. This project will observe the degradation of an audio signal after the application of Singular Value Decomposition compression, a lossy compression that eliminates singular values from a signal’s matrix.

Date Created

2021-05

Agent

Author (aut): Hirte, Amanda
Thesis director: Kosut, Oliver
Committee member: Bliss, Daniel
Contributor (ctb): Electrical Engineering Program
Contributor (ctb): Electrical Engineering Program
Contributor (ctb): Barrett, The Honors College

Accelerator for Flexible QR Decomposition and Back Substitution

Description

QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square
on-square matrix. It has a wide range
of applications especially in Multiple Input-Multiple Output (MIMO) communication
systems. Unfortunately it has high computation complexity { for matrix size of nxn,
QRD has O(n3) complexity and back substitution, which is used to solve a system
of linear equations, has O(n2) complexity. Thus, as the matrix size increases, the
hardware resource requirement for QRD and back substitution increases signicantly.
This thesis presents the design and implementation of a
exible QRD and back substitution accelerator using a folded architecture. It can support matrix sizes of
4x4, 8x8, 12x12, 16x16, and 20x20 with low hardware resource requirement.
The proposed architecture is based on the systolic array implementation of the
Givens algorithm for QRD. It is built with three dierent types of computation blocks
which are connected in a 2-D array structure. These blocks are controlled by a
scheduler which facilitates reusability of the blocks to perform computation for any
input matrix size which is a multiple of 4. These blocks are designed using two
basic programming elements which support both the forward and backward paths to
compute matrix R in QRD and column-matrix X in back substitution computation.
The proposed architecture has been mapped to Xilinx Zynq Ultrascale+ FPGA
(Field Programmable Gate Array), ZCU102. All inputs are complex with precision
of 40 bits (38 fractional bits and 1 signed bit). The architecture can be clocked at
50 MHz. The synthesis results of the folded architecture for dierent matrix sizes
are presented. The results show that the folded architecture can support QRD and
back substitution for inputs of large sizes which otherwise cannot t on an FPGA
when implemented using a
at architecture. The memory sizes required for dierent
matrix sizes are also presented.

Date Created

2020

Agent

Author (aut): Kanagala, Srimayee
Thesis advisor (ths): Chakrabarti, Chaitali
Committee member: Bliss, Daniel
Committee member: Cao, Yu (Kevin)
Publisher (pbl): Arizona State University

RNS-Based NTT Polynomial Multiplier for Lattice-Based Cryptography

Description

Lattice-based Cryptography is an up and coming field of cryptography that utilizes the difficulty of lattice problems to design lattice-based cryptosystems that are resistant to quantum attacks and applicable to Fully Homomorphic Encryption schemes (FHE). In this thesis, the parallelization of the Residue Number System (RNS) and algorithmic efficiency of the Number Theoretic Transform (NTT) are combined to tackle the most significant bottleneck of polynomial ring multiplication with the hardware design of an optimized RNS-based NTT polynomial multiplier. The design utilizes Negative Wrapped Convolution, the NTT, RNS Montgomery reduction with Bajard and Shenoy extensions, and optimized modular 32-bit channel arithmetic for nine RNS channels to accomplish an RNS polynomial multiplication. In addition to a full software implementation of the whole system, a pipelined and optimized RNS-based NTT unit with 4 RNS butterflies is implemented on the Xilinx Artix-7 FPGA(xc7a200tlffg1156-2L) for size and delay estimates. The hardware implementation achieves an operating frequency of 47.043 MHz and utilizes 13239 LUT's, 4010 FF's, and 330 DSP blocks, allowing for multiple simultaneously operating NTT units depending on FGPA size constraints.

Date Created

2020

Agent

Author (aut): Brist, Logan Alan
Thesis advisor (ths): Chakrabarti, Chaitali
Committee member: Papandreou-Suppappola, Antonia
Committee member: Bliss, Daniel
Publisher (pbl): Arizona State University

Subscribe to Bliss, Daniel