Self-aware, Adaptive General-purpose Computing Systems

171583-Thumbnail Image.png
With the breakdown of Dennard scaling, computer architects can no longer rely on integrated circuit energy efficiency to scale with transistor density, and must under-clock or power-gate parts of their designs in order to fit within given power budgets. Hardware

With the breakdown of Dennard scaling, computer architects can no longer rely on integrated circuit energy efficiency to scale with transistor density, and must under-clock or power-gate parts of their designs in order to fit within given power budgets. Hardware accelerators may improve energy efficiency of some compute-intensive tasks, but as more tasks are accelerated, the general-purpose portions of workloads account for a larger share of execution time while also leaving less instruction, data, or task-level parallelism to exploit. Adaptive computing systems have potential to address these challenges by modifying their behavior at runtime. Adaptation requires runtime decision-making, which can be performed both in hardware and software. While software-based decision-making is more flexible and can execute higher complexity operations compared to hardware, it also incurs a significant latency and power overhead. Hardware designs are more limited in the space of decisions they can make, but have direct access to their own internal microarchitectural states and can make faster decisions, allowing for better-informed adaptation and extracting previously unobtainable performance and security benefits. In this dissertation I study (i) the viability and trade-offs of general-purpose adaptive systems, (ii) the difficulty and complexity of making adaptation decisions, and (iii) how time spent in the observation-analysis-adaptation cycle affects adaptation benefits. I introduce techniques for (a) modeling and understanding high performance computing systems and microarchitecture, (b) enabling hardware learning and decision-making through low-latency networks, and (c) on securing hardware designs using runtime decision-making. I propose an always-awake and active learning `hardware nervous system' pervasive throughout the chip that can reason about the individual hardware module performance, energy usage, and security. I present the design and implementation of (1) a reference architecture and (2) a microarchitecture-aware static binary instrumentation tool. Finally, I provide results showing (1) that runtime adaptation is a necessary to continue improving performance on general-purpose tasks, (2) that significant performance loss and performance variation happens under the ISA-level, and is unobservable without hardware support, and (3) that hardware must possess decision-making and ‘self-awareness’ capabilities at the microarchitecture level in order to efficiently use its own faculties.
Date Created

Code Generation Techniques For Emerging Capability Architectures

171410-Thumbnail Image.png
Memory safety and security issues continue to plague modern systems and are rapidly becoming a top priority. Capability architectures are a proposed solution that solve the problem at a fundamental hardware level, with several commercially viable options under active development.

Memory safety and security issues continue to plague modern systems and are rapidly becoming a top priority. Capability architectures are a proposed solution that solve the problem at a fundamental hardware level, with several commercially viable options under active development. These new and evolving designs place higher demand upon the software tools needed to develop software to ensure correct execution. Capabilities introduce ideas that challenge typical architecture assumptions about the representation of data and its location in memory. This calls for a new core system software ecosystem. A fundamental component of any software ecosystem is a compiler. Without a compiler, large critical components of the ecosystem must be written in assembly language; a tedious and possibly error-prone task. A compiler for a capability architecture that emphasizes memory security must above all else ensure functional and correct code generation, raw performance and power efficiency are no longer the chief concerns. Compilers for these architectures have been developed, but as capability architectures mature in complexity new compilation support is required. A set of techniques that help solve the compilation challenges for a capability architecture are presented in this work. These capability-aware compiler ideas are presented in their generalized forms to enable their adoption in other architectures and future extensions. Some of the ideas presented come out of work on a compiler for a new capability architecture, Zeno. The Zeno compiler utilizes the extensible RISC-V instruction set and adds a set of global memory extensions, xBGAS (Extended Base Global Address Space), which is used to provide memory security. The Zeno compiler is described in detail as an implementation of the generalized capability-aware compiler. Static analysis is used to evaluate the generated assembly code produced by the compiler. Rather than focusing on the runtime performance of code generated by the Zeno compiler, this work evaluates the compiler based on a static analysis of the generated source code. We find the code produced by the Zeno compiler sufficient to enable further testing of the Zeno architecture and drive its development. The generated code is sufficient to enable further testing of the Zeno architecture and drive its development.
Date Created

Eleatic: Secure Architecture Across the Edge-to-Cloud Continuum

171405-Thumbnail Image.png
Many companies face pressure to deploy flexible compute infrastructures to manage their operations. However, the current developments in cloud and edge computing have created a data processing asymmetry challenge. On the edge, workloads frequently require low-latency responses, contend with connectivity

Many companies face pressure to deploy flexible compute infrastructures to manage their operations. However, the current developments in cloud and edge computing have created a data processing asymmetry challenge. On the edge, workloads frequently require low-latency responses, contend with connectivity and bandwidth instabilities, may require privacy guarantees, and may perform under limited or high-variance compute resources. In the cloud, workloads tolerate longer latency, expect highly available infrastructure, access high-performance compute resources, and have more power available, but may be further from where the processing results are needed. This compute asymmetry challenge requires a new computational paradigm. In this work, I advance a new computing architecture model, called the Continuum Computing Architecture (CCA), and validate this model with a candidate architecture. CCA is a unifying edge-fog-cloud computing model that provides the following capabilities: (i) a continuum of compute that spans from network-connected edge devices to the cloud – with very low power consumption to high-performance compute; (ii) same architecture with different micro-architectures along this compute continuum – a single RISC-V instruction set architecture with reconfigurable processing units; (iii) portability across all scales – the same program can be run across the continuum with different latencies and power utilizations; and (iv) secure shared memory features are fully-supported – physical memories along the continuum are abstracted to allow edge and cloud to share data in a transparent fashion. The validating architecture has three micro-architectures. The edge micro-architecture, Parmenides, targets accelerator-based edge processing system-on-chips (SoCs). Parmenides includes security features to protect the SoC in uncontrolled environments while adapting its power usage and processing to ambient events. The fog and cloud micro-architectures, Melissus and Zeno, must support application data distribution across the memory of many compute nodes to achieve the desired scale and performance. As a solution, I introduce the Eleatic Memory Model (EMM): a global shared memory architecture with hardware-supported global memory access permissions. All memory accesses are made with a Namespace-based capability scheme that supports improved scalability and memory security. The CCA model addresses several memory-centric security challenges including the misuse of resources, risk to application and data integrity, as well as concerns over authorization and confidentiality.
Date Created