Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area
as the athlete, then the coach will not be able to see the athlete’s full body…
Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area
as the athlete, then the coach will not be able to see the athlete’s full body and thus
cannot give precise guidance to the athlete, limiting the athlete’s improvement. To
address these challenges, this paper proposes Augmented Coach, an augmented reality
platform where coaches can view, manipulate and comment on athletes’ movement
volumetric video data remotely via the network. In particular, this work includes a).
Capturing the athlete’s movement video data with Kinects and converting it into point
cloud format b). Transmitting the point cloud data to the coach’s Oculus headset via
5G or wireless network c). Coach’s commenting on the athlete’s joints. In addition,
the evaluation of Augmented Coach includes an assessment of its performance from
five metrics via the wireless network and 5G network environment, but also from the
coaches’ and athletes’ experience of using it. The result shows that Augmented Coach
enables coaches to instruct athletes from a distance and provide effective feedback for
correcting athletes’ motions under the network.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Detection of anomalies before they are included in the downstream diagnosis/prognosis models is an important criterion for maintaining the medical AI imaging model performance across internal and external datasets. Furthermore, the need to curate huge amounts of data to train…
Detection of anomalies before they are included in the downstream diagnosis/prognosis models is an important criterion for maintaining the medical AI imaging model performance across internal and external datasets. Furthermore, the need to curate huge amounts of data to train supervised models that produce precise results also requires an automated model that can accurately identify in-distribution (ID) and out-of-distribution (OOD) data for ensuring the training dataset quality. However, the core challenges for designing such as system are – (i) given the infinite variations of the anomaly, curation of training data is in-feasible; (ii) making assumptions about the types of anomalies are often hypothetical. The proposed work designed an unsupervised anomaly detection model using a cascade variational autoencoder coupled with a zero-shot learning network that maps the latent vectors to semantic attributes. The performance of the proposed model is shown on two different use cases – skin images and chest radiographs and also compare against the same class of state-of-the-art generative OOD detection models.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving…
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world. Advanced information is necessary to address these limitations. Fortunately, recent developments in learning-based 3D reconstruction allow robots to not only detect semantic meanings, but also recognize the 3D structure of objects from a few images. By combining this 3D structural information, SLAM can be improved from a low-level approach to a structure-aware approach. This work propose a novel approach for multi-view 3D reconstruction using recurrent transformer. This approach allows robots to accumulate information from multiple views and encode them into a compact latent space. The resulting latent representations are then decoded to produce 3D structural landmarks, which can be used to improve robot localization and mapping.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Computed tomography (CT) and synthetic aperture sonar (SAS) are tomographic imaging techniques that are fundamental for applications within medical and remote sensing. Despite their successes, a number of factors constrain their image quality. For example, a time-varying scene during measurement…
Computed tomography (CT) and synthetic aperture sonar (SAS) are tomographic imaging techniques that are fundamental for applications within medical and remote sensing. Despite their successes, a number of factors constrain their image quality. For example, a time-varying scene during measurement acquisition yields image artifacts. Additionally, factors such as bandlimited or sparse measurements limit image resolution. This thesis presents novel algorithms and techniques to account for these factors during image formation and outperform traditional reconstruction methods. In particular, this thesis formulates analysis-by-synthesis optimizations that leverage neural fields to predict the scene and differentiable physics models that incorporate prior knowledge about image formation. The specific contributions include: (1) a method for reconstructing CT measurements from time-varying (non-stationary) scenes; (2) a method for deconvolving SAS images, which benefits image quality; (3) a method that couples neural fields and a differentiable acoustic model for 3D SAS reconstructions.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital…
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital…
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital…
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Computer vision is becoming an essential component of embedded system applications such as smartphones, wearables, autonomous systems and internet-of-things (IoT). These applications are generally deployed into environments with limited energy, memory bandwidth and computational resources. This trend is driving the…
Computer vision is becoming an essential component of embedded system applications such as smartphones, wearables, autonomous systems and internet-of-things (IoT). These applications are generally deployed into environments with limited energy, memory bandwidth and computational resources. This trend is driving the development of energy-effi cient image processing solutions from sensing to computation. In this thesis, diff erent alternatives are explored to implement energy-efficient computer vision systems. First, I present a fi eld programmable gate array (FPGA) implementation of an adaptive subsampling algorithm for region-of-interest (ROI) -based object tracking. By implementing the computationally intensive sections of this algorithm on an FPGA, I aim to offl oad computing resources from energy-ineffi cient graphics processing units (GPUs) and/or general-purpose central processing units (CPUs). I also present a working system executing this algorithm in near real-time latency implemented on a standalone embedded device. Secondly, I present a neural network-based pipeline to improve the performance of event-based cameras in non-ideal optical conditions. Event-based cameras or dynamic vision sensors (DVS) are bio-inspired sensors that measure logarithmic per-pixel brightness changes in a scene. Their advantages include high dynamic range, low latency and ultra-low power when compared to standard frame-based cameras. Several tasks have been proposed to take advantage of these novel sensors but they rely on perfectly calibrated optical lenses that are in-focus. In this work I propose a methodto reconstruct events captured with an out-of-focus event-camera so they can be fed into an intensity reconstruction task. The network is trained with a dataset generated by simulating defocus blur in sequences from object tracking datasets such as LaSOT and OTB100. I also test the generalization performance of this network in scenes captured with a DAVIS event-based sensor equipped with an out-of-focus lens.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain
for a video captioning system to generate natural language descriptions focusing on
the prominent interest…
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain
for a video captioning system to generate natural language descriptions focusing on
the prominent interest and aligning with the latent aspects beyond observations. This
work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as
CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the
training of video captioning model with a novel paradigm for sentence-level semantic
alignment. Specifically, commonsense knowledge is queried to complement per training
caption by querying a generic knowledge atlas ATOMIC, and form the commonsense-
caption entailment corpus. A BERT based language entailment model trained from
this corpus then serves as a commonsense discriminator for the training of video
captioning model, and penalizes the model from generating semantically misaligned
captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX
datasets, CAVAN consistently improves the quality of generations and shows higher
keyword hit rate. Experimental results with ablations validate the effectiveness of
CAVAN and reveals that the use of commonsense knowledge contributes to the video
caption generation.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Visual navigation is a useful and important task for a variety of applications. As the prevalence of robots increase, there is an increasing need for energy-efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented…
Visual navigation is a useful and important task for a variety of applications. As the prevalence of robots increase, there is an increasing need for energy-efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the literature, but there is a lack of work on evaluation of the efficiency of the image sensors. In this thesis, two methods are evaluated: adaptive image sensor quantization for traditional camera pipelines as well as new event-based sensors for low-power computer vision.The first contribution in this thesis is an evaluation of performing varying levels of sensor linear and logarithmic quantization with the task of visual simultaneous localization and mapping (SLAM). This unconventional method can provide efficiency benefits with a trade off between accuracy of the task and energy-efficiency. A new sensor quantization method, gradient-based quantization, is introduced to improve the accuracy of the task. This method only lowers the bit level of parts of the image that are less likely to be important in the SLAM algorithm since lower bit levels signify better energy-efficiency, but worse task accuracy. The third contribution is an evaluation of the efficiency and accuracy of event-based camera intensity representations for the task of optical flow. The results of performing a learning based optical flow are provided for each of five different reconstruction methods along with ablation studies. Lastly, the challenges of an event feature-based SLAM system are presented with results demonstrating the necessity for high quality and high resolution event data. The work in this thesis provides studies useful for examining tradeoffs for an efficient visual navigation system with traditional and event vision sensors. The results of this thesis also provide multiple directions for future work.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)