Spatial audio can be especially useful for directing human attention. However, delivering spatial audio through speakers, rather than headphones that deliver audio directly to the ears, produces the issue of crosstalk, where sounds from each of the two speakers reach…
Spatial audio can be especially useful for directing human attention. However, delivering spatial audio through speakers, rather than headphones that deliver audio directly to the ears, produces the issue of crosstalk, where sounds from each of the two speakers reach the opposite ear, inhibiting the spatialized effect. A research team at Meteor Studio has developed an algorithm called Xblock that solves this issue using a crosstalk cancellation technique. This thesis project expands upon the existing Xblock IoT system by providing a way to test the accuracy of the directionality of sounds generated with spatial audio. More specifically, the objective is to determine whether the usage of Xblock with smart speakers can provide generalized audio localization, which refers to the ability to detect a general direction of where a sound might be coming from. This project also expands upon the existing Xblock technique to integrate voice commands, where users can verbalize the name of a lost item using the phrase, “Find [item]”, and the IoT system will use spatial audio to guide them to it.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Spatial audio can be especially useful for directing human attention. However, delivering spatial audio through speakers, rather than headphones that deliver audio directly to the ears, produces the issue of crosstalk, where sounds from each of the two speakers reach…
Spatial audio can be especially useful for directing human attention. However, delivering spatial audio through speakers, rather than headphones that deliver audio directly to the ears, produces the issue of crosstalk, where sounds from each of the two speakers reach the opposite ear, inhibiting the spatialized effect. A research team at Meteor Studio has developed an algorithm called Xblock that solves this issue using a crosstalk cancellation technique. This thesis project expands upon the existing Xblock IoT system by providing a way to test the accuracy of the directionality of sounds generated with spatial audio. More specifically, the objective is to determine whether the usage of Xblock with smart speakers can provide generalized audio localization, which refers to the ability to detect a general direction of where a sound might be coming from. This project also expands upon the existing Xblock technique to integrate voice commands, where users can verbalize the name of a lost item using the phrase, “Find [item]”, and the IoT system will use spatial audio to guide them to it.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it…
I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it fits in to rhythmic pixel regions, along with a studyon memory traffic of rhythmic pixel regions and how this translates to energy efficiency. This rhythmic pixel region-based camera pipeline has been jointly developed through Dr. Robert LiKamWa’s research lab. High spatiotemporal resolutions allow high precision for vision applications, such as for detecting features for augmented reality or face detection. High spatiotemporal resolution also comes with high memory throughput, leading to higher energy usage. This creates a tradeoff between high precision and energy efficiency, which becomes more important in mobile systems. In addition, not all pixels in a frame are necessary for the vision application, such as pixels that make up the background. Rhythmic pixel regions aim to reduce the tradeoff by creating a pipeline that allows an application developer to specify regions to capture at a non-uniform spatiotemporal resolution. This is accomplished by encoding the incoming image, and only sending the pixels within these specified regions. Later these encoded representations will be decoded to a standard frame representation usable by traditional vision applications. My contribution to this effort has been the design, testing and evaluation of the I/O controller.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Commonly, image processing is handled on a CPU that is connected to the image sensor by a wire. In these far-sensor processing architectures, there is energy loss associated with sending data across an interconnect from the sensor to the CPU.…
Commonly, image processing is handled on a CPU that is connected to the image sensor by a wire. In these far-sensor processing architectures, there is energy loss associated with sending data across an interconnect from the sensor to the CPU. In an effort to increase energy efficiency, near-sensor processing architectures have been developed, in which the sensor and processor are stacked directly on top of each other. This reduces energy loss associated with sending data off-sensor. However, processing near the image sensor causes the sensor to heat up. Reports of thermal noise in near-sensor processing architectures motivated us to study how temperature affects image quality on a commercial image sensor and how thermal noise affects computer vision task accuracy. We analyzed image noise across nine different temperatures and three sensor configurations to determine how image noise responds to an increase in temperature. Ultimately, our team used this information, along with transient analysis of a stacked image sensor’s thermal behavior, to advise thermal management strategies that leverage the benefits of near-sensor processing and prevent accuracy loss at problematic temperatures.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Visualizations are an integral component for communicating and evaluating modern networks. As data becomes more complex, info-graphics require a balance between visual noise and effective storytelling that is often restricted by layouts unsuitable for scalability. The challenge then rests upon…
Visualizations are an integral component for communicating and evaluating modern networks. As data becomes more complex, info-graphics require a balance between visual noise and effective storytelling that is often restricted by layouts unsuitable for scalability. The challenge then rests upon researchers to effectively structure their information in a way that allows for flexible, transparent illustration. We propose network graphing as an operative alternative for demonstrating community behavior over traditional charts which are unable to look past numeric data. In this paper, we explore methods for manipulating, processing, cleaning, and aggregating data in Python; a programming language tailored for handling structured data, which can then be formatted for analysis and modeling of social network tendencies in Gephi. We implement this data by applying an algorithm known as the Fruchterman-Reingold force-directed layout to datasets of Arizona State University’s research and collaboration network. The result is a visualization that analyzes the university’s infrastructure by providing insight about community behaviors between colleges. Furthermore, we highlight how the flexibility of this visualization provides a foundation for specific use cases by demonstrating centrality measures to find important liaisons that connect distant communities.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
As the prevalence of augmented reality (AR) technology continues to increase, so too have methods for improving the appearance and behavior of computer-generated objects. This is especially significant as AR applications now expand to territories outside of the entertainment…
As the prevalence of augmented reality (AR) technology continues to increase, so too have methods for improving the appearance and behavior of computer-generated objects. This is especially significant as AR applications now expand to territories outside of the entertainment sphere and can be utilized for numerous purposes encompassing but not limited to education, specialized occupational training, retail & online shopping, design, marketing, and manufacturing. Due to the nature of AR technology, where computer-generated objects are being placed into a real-world environment, a decision has to be made regarding the visual connection between the tangible and the intangible. Should the objects blend seamlessly into their environment or purposefully stand out? It is not purely a stylistic choice. A developer must consider how their application will be used — in many instances an optimal user experience is facilitated by mimicking the real world as closely as possible; even simpler applications, such as those built primarily for mobile devices, can benefit from realistic AR. The struggle here lies in creating an immersive user experience that is not reliant on computationally-expensive graphics or heavy-duty models. The research contained in this thesis provides several ways for achieving photorealistic rendering in AR applications using a range of techniques, all of which are supported on mobile devices. These methods can be employed within the Unity Game Engine and incorporate shaders, render pipelines, node-based editors, post-processing, and light estimation.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile…
This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually pleasing photographs autonomously in areal photography, wildlife photography, landscape photography or in personal photography.
The viewpoint recommendation problem can be divided into two stages: (a) generating a set of dense novel views based on the basis views captured about the subject. The dense novel views are useful to better understand the scene and to know how the subject looks from different viewpoints and (b) each novel is scored based on how aesthetically good it is. The viewpoint with the greatest aesthetic score is recommended for capturing a visually pleasing photograph.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD),…
This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels include an LSTM (Long Short Term Memory) cell, and a GRU (gated Recurrent Unit) cell used in recurrent neural networks. The neural network based recommender systems engine consists of multiple kernels including fully connected layers, embedding layer, 1-D batchnorm, Adam optimizer, etc.
Transformer is a massively parallel reconfigurable multicore architecture designed at the University of Michigan. The Transformer configuration considered here is 4 tiles and 16 General Processing Elements (GPEs) per tile. It supports a two level cache hierarchy where the L1 and L2 caches can operate in shared (S) or private (P) modes. The architecture was modeled using Gem5 and cycle accurate simulations were done to evaluate the performance in terms of execution times, giga-operations per second per Watt (GOPS/W), and giga-floating-point-operations per second per Watt (GFLOPS/W).
This thesis shows that for linear algebra kernels, each kernel achieves high performance for a certain cache mode and that this cache mode can change when the matrix size changes. For instance, for smaller matrix sizes, L1P, L2P cache mode is best for TRSM, while L1S, L2S is the best cache mode for LUD, and L1P, L2S is the best for QRD. For each kernel, the optimal cache mode changes when the matrix size is increased. For instance, for TRSM, the L1P, L2P cache mode is best for smaller matrix sizes ($N=64, 128, 256, 512$) and it changes to L1S, L2P for larger matrix sizes ($N=1024$). For machine learning kernels, L1P, L2P is the best cache mode for all network parameter sizes.
Gem5 simulations show that the peak performance for TRSM, LUD, QRD and Matrix Inverse in the 14nm node is 97.5, 59.4, 133.0 and 83.05 GFLOPS/W, respectively. For LSTM and GRU, the peak performance is 44.06 and 69.3 GFLOPS/W.
The neural network based recommender system was implemented in L1S, L2S cache mode. It includes a forward pass and a backward pass and is significantly more complex in terms of both computational complexity and data movement. The most computationally intensive block is the fully connected layer followed by Adam optimizer. The overall performance of the recommender systems engine is 54.55 GFLOPS/W and 169.12 GOPS/W.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Visual applications – those that use camera frames as part of the application – provide a rich, context-aware experience. The continued development of mixed and augmented reality (MR/AR) computing environments furthers the richness of this experience by providing applications a…
Visual applications – those that use camera frames as part of the application – provide a rich, context-aware experience. The continued development of mixed and augmented reality (MR/AR) computing environments furthers the richness of this experience by providing applications a continuous vision experience, where visual information continuously provides context for applications and the real world is augmented by the virtual. To understand user privacy concerns in continuous vision computing environments, this work studies three MR/AR applications (augmented markers, augmented faces, and text capture) to show that in a modern mobile system, the typical user is exposed to potential mass collection of sensitive information, posing privacy and security deficiencies to be addressed in future systems.
To address such deficiencies, a development framework is proposed that provides resource isolation between user information contained in camera frames and application access to the network. The design is implemented using existing system utilities as a proof of concept on the Android operating system and demonstrates its viability with a modern state-of-the-art augmented reality library and several augmented reality applications. Evaluation is conducted on the design on a Samsung Galaxy S8 phone by comparing the applications from the case study with modified versions which better protect user privacy. Early results show that the new design efficiently protects users against data collection in MR/AR applications with less than 0.7% performance overhead.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Acoustic Ecology is an undervalued field of study of the relationship between the environment and sound. This project aims to educate people on this topic and show people the importance by immersing them in virtual reality scenes. The scenes were created using VR180 content as well as 360° spatial audio.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)