Determining the integrity of applications and operating systems using remote and local attesters

149851-Thumbnail Image.png
Description
This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform.

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute, and the code generates results which are sent to the external entity. These results provide the external entity an assurance as to whether the client application and the OS are in pristine condition. This work also presents a technique where it can be verified that the application which was attested, did not get replaced by a different application after completion of the attestation. The implementation of these three techniques was achieved entirely in software and is backward compatible with legacy machines on the Intel x86 architecture. This research also presents two approaches to incorporating software based "root of trust" using Virtual Machine Monitors (VMMs). The first approach determines the integrity of an executing Guest OS from the Host OS using Linux Kernel-based Virtual Machine (KVM) and qemu emulation software. The second approach implements a small VMM called MIvmm that can be utilized as a trusted codebase to build security applications such as those implemented in this research. MIvmm was conceptualized and implemented without using any existing codebase; its minimal size allows it to be trustworthy. Both the VMM approaches leverage processor support for virtualization in the Intel x86 architecture.
Date Created
2011
Agent

A New RNS 4-moduli set for the implementation of FIR filters

149788-Thumbnail Image.png
Description
Residue number systems have gained significant importance in the field of high-speed digital signal processing due to their carry-free nature and speed-up provided by parallelism. The critical aspect in the application of RNS is the selection of the moduli set

Residue number systems have gained significant importance in the field of high-speed digital signal processing due to their carry-free nature and speed-up provided by parallelism. The critical aspect in the application of RNS is the selection of the moduli set and the design of the conversion units. There have been several RNS moduli sets proposed for the implementation of digital filters. However, some are unbalanced and some do not provide the required dynamic range. This thesis addresses the drawbacks of existing RNS moduli sets and proposes a new moduli set for efficient implementation of FIR filters. An efficient VLSI implementation model has been derived for the design of a reverse converter from RNS to the conventional two's complement representation. This model facilitates the realization of a reverse converter for better performance with less hardware complexity when compared with the reverse converter designs of the existing balanced 4-moduli sets. Experimental results comparing multiply and accumulate units using RNS that are implemented using the proposed four-moduli set with the state-of-the-art balanced four-moduli sets, show large improvements in area (46%) and power (43%) reduction for various dynamic ranges. RNS FIR filters using the proposed moduli-set and existing balanced 4-moduli set are implemented in RTL and compared for chip area and power and observed 20% improvements. This thesis also presents threshold logic implementation of the reverse converter.
Date Created
2011
Agent

Topics in power and performance optimization of embedded systems

149617-Thumbnail Image.png
Description
The ubiquity of embedded computational systems has exploded in recent years impacting everything from hand-held computers and automotive driver assistance to battlefield command and control and autonomous systems. Typical embedded computing systems are characterized by highly resource constrained operating environments.

The ubiquity of embedded computational systems has exploded in recent years impacting everything from hand-held computers and automotive driver assistance to battlefield command and control and autonomous systems. Typical embedded computing systems are characterized by highly resource constrained operating environments. In particular, limited energy resources constrain performance in embedded systems often reliant on independent fuel or battery supplies. Ultimately, mitigating energy consumption without sacrificing performance in these systems is paramount. In this work power/performance optimization emphasizing prevailing data centric applications including video and signal processing is addressed for energy constrained embedded systems. Frameworks are presented which exchange quality of service (QoS) for reduced power consumption enabling power aware energy management. Power aware systems provide users with tools for precisely managing available energy resources in light of user priorities, extending availability when QoS can be sacrificed. Specifically, power aware management tools for next generation bistable electrophoretic displays and the state of the art H.264 video codec are introduced. The multiprocessor system on chip (MPSoC) paradigm is examined in the context of next generation many-core hand-held computing devices. MPSoC architectures promise to breach the power/performance wall prohibiting advancement of complex high performance single core architectures. Several many-core distributed memory MPSoC architectures are commercially available, while the tools necessary to effectively tap their enormous potential remain largely open for discovery. Adaptable scalability in many-core systems is addressed through a scalable high performance multicore H.264 video decoder implemented on the representative Cell Broadband Engine (CBE) architecture. The resulting agile performance scalable system enables efficient adaptive power optimization via decoding-rate driven sleep and voltage/frequency state management. The significant problem of mapping applications onto these architectures is additionally addressed from the perspective of instruction mapping for limited distributed memory architectures with a code overlay generator implemented on the CBE. Finally runtime scheduling and mapping of scalable applications in multitasking environments is addressed through the introduction of a lightweight work partitioning framework targeting streaming applications with low latency and near optimal throughput demonstrated on the CBE.
Date Created
2011
Agent

UnSync: a soft error resilient redundant CMP architecture

149560-Thumbnail Image.png
Description
Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to

Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the availability of increased hardware resources, redundancy based techniques are the most promising methods to eradicate soft error failures in CMP systems. This work proposes a novel customizable and redundant CMP architecture (UnSync) that utilizes hardware based detection mechanisms (most of which are readily available in the processor), to reduce overheads during error free executions. In the presence of errors (which are infrequent), the always forward execution enabled recovery mechanism provides for resilience in the system. The inherent nature of UnSync architecture framework supports customization of the redundancy, and thereby provides means to achieve possible performance-reliability trade-offs in many-core systems. This work designs a detailed RTL model of UnSync architecture and performs hardware synthesis to compare the hardware (power/area) overheads incurred. It then compares the same with those of the Reunion technique, a state-of-the-art redundant multi-core architecture. This work also performs cycle-accurate simulations over a wide range of SPEC2000, and MiBench benchmarks to evaluate the performance efficiency achieved over that of the Reunion architecture. Experimental results show that, UnSync architecture reduces power consumption by 34.5% and improves performance by up to 20% with 13.3% less area overhead, when compared to Reunion architecture for the same level of reliability achieved.
Date Created
2011
Agent

Modeling and implementation of threshold logic circuits and architectures

149343-Thumbnail Image.png
Description
Threshold logic has long been studied as a means of achieving higher performance and lower power dissipation, providing improvements by condensing simple logic gates into more complex primitives, effectively reducing gate count, pipeline depth, and number of interconnects. This work

Threshold logic has long been studied as a means of achieving higher performance and lower power dissipation, providing improvements by condensing simple logic gates into more complex primitives, effectively reducing gate count, pipeline depth, and number of interconnects. This work proposes a new physical implementation of threshold logic, the threshold logic latch (TLL), which overcomes the difficulties observed in previous work, particularly with respect to gate reliability in the presence of noise and process variations. Simple but effective models were created to assess the delay, power, and noise margin of TLL gates for the purpose of determining the physical parameters and assignment of input signals that achieves the lowest delay subject to constraints on power and reliability. From these models, an optimized library of standard TLL cells was developed to supplement a commercial library of static CMOS gates. The new cells were then demonstrated on a number of automatically synthesized, placed, and routed designs. A two-stage 2's complement integer multiplier designed with CMOS and TLL gates utilized 19.5% less area, 28.0% less active power, and 61.5% less leakage power than an equivalent design with the same performance using only static CMOS gates. Additionally, a two-stage 32-instruction 4-way issue queue designed with CMOS and TLL gates utilized 30.6% less area, 31.0% less active power, and 58.9% less leakage power than an equivalent design with the same performance using only static CMOS gates.
Date Created
2010
Agent