Networked System for Volumetric Athletic Coaching in Augmented Reality

187854-Thumbnail Image.png
Description
Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body

Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the athlete’s improvement. To address these challenges, this paper proposes Augmented Coach, an augmented reality platform where coaches can view, manipulate and comment on athletes’ movement volumetric video data remotely via the network. In particular, this work includes a). Capturing the athlete’s movement video data with Kinects and converting it into point cloud format b). Transmitting the point cloud data to the coach’s Oculus headset via 5G or wireless network c). Coach’s commenting on the athlete’s joints. In addition, the evaluation of Augmented Coach includes an assessment of its performance from five metrics via the wireless network and 5G network environment, but also from the coaches’ and athletes’ experience of using it. The result shows that Augmented Coach enables coaches to instruct athletes from a distance and provide effective feedback for correcting athletes’ motions under the network.
Date Created
2023
Agent

Anomaly Detection using Cascade Variational Autoencoder Coupled with Zero Shot Learning – Medical Imaging Use Cases

187836-Thumbnail Image.png
Description
Detection of anomalies before they are included in the downstream diagnosis/prognosis models is an important criterion for maintaining the medical AI imaging model performance across internal and external datasets. Furthermore, the need to curate huge amounts of data to train

Detection of anomalies before they are included in the downstream diagnosis/prognosis models is an important criterion for maintaining the medical AI imaging model performance across internal and external datasets. Furthermore, the need to curate huge amounts of data to train supervised models that produce precise results also requires an automated model that can accurately identify in-distribution (ID) and out-of-distribution (OOD) data for ensuring the training dataset quality. However, the core challenges for designing such as system are – (i) given the infinite variations of the anomaly, curation of training data is in-feasible; (ii) making assumptions about the types of anomalies are often hypothetical. The proposed work designed an unsupervised anomaly detection model using a cascade variational autoencoder coupled with a zero-shot learning network that maps the latent vectors to semantic attributes. The performance of the proposed model is shown on two different use cases – skin images and chest radiographs and also compare against the same class of state-of-the-art generative OOD detection models.
Date Created
2023
Agent

From SLAM to Spatial AI: Using Implicit 3D Latent Space Landmark Reconstruction for Robot Localization and Mapping

187693-Thumbnail Image.png
Description
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving

Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world. Advanced information is necessary to address these limitations. Fortunately, recent developments in learning-based 3D reconstruction allow robots to not only detect semantic meanings, but also recognize the 3D structure of objects from a few images. By combining this 3D structural information, SLAM can be improved from a low-level approach to a structure-aware approach. This work propose a novel approach for multi-view 3D reconstruction using recurrent transformer. This approach allows robots to accumulate information from multiple views and encode them into a compact latent space. The resulting latent representations are then decoded to produce 3D structural landmarks, which can be used to improve robot localization and mapping.
Date Created
2023
Agent

Neural Fields for Tomographic Imaging: with Applications in X-ray Computed Tomography and Synthetic Aperture Sonar

187685-Thumbnail Image.png
Description
Computed tomography (CT) and synthetic aperture sonar (SAS) are tomographic imaging techniques that are fundamental for applications within medical and remote sensing. Despite their successes, a number of factors constrain their image quality. For example, a time-varying scene during measurement

Computed tomography (CT) and synthetic aperture sonar (SAS) are tomographic imaging techniques that are fundamental for applications within medical and remote sensing. Despite their successes, a number of factors constrain their image quality. For example, a time-varying scene during measurement acquisition yields image artifacts. Additionally, factors such as bandlimited or sparse measurements limit image resolution. This thesis presents novel algorithms and techniques to account for these factors during image formation and outperform traditional reconstruction methods. In particular, this thesis formulates analysis-by-synthesis optimizations that leverage neural fields to predict the scene and differentiable physics models that incorporate prior knowledge about image formation. The specific contributions include: (1) a method for reconstructing CT measurements from time-varying (non-stationary) scenes; (2) a method for deconvolving SAS images, which benefits image quality; (3) a method that couples neural fields and a differentiable acoustic model for 3D SAS reconstructions.
Date Created
2023
Agent

Flow Web Prototype IV_PDF.pdf

Description

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.

Date Created
2023-05
Agent

Dredd_Spring_2023.pdf

Description

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.

Date Created
2023-05
Agent

Finding Flow: Crafting a Website to Help Students Overcome Blocks

Description
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.
Date Created
2023-05
Agent

Computational Imaging for Energy-Efficient Cameras: Adaptive ROI-based Object Tracking and Optically Defocused Event-based Sensing.

171616-Thumbnail Image.png
Description
Computer vision is becoming an essential component of embedded system applications such as smartphones, wearables, autonomous systems and internet-of-things (IoT). These applications are generally deployed into environments with limited energy, memory bandwidth and computational resources. This trend is driving the

Computer vision is becoming an essential component of embedded system applications such as smartphones, wearables, autonomous systems and internet-of-things (IoT). These applications are generally deployed into environments with limited energy, memory bandwidth and computational resources. This trend is driving the development of energy-effi cient image processing solutions from sensing to computation. In this thesis, diff erent alternatives are explored to implement energy-efficient computer vision systems. First, I present a fi eld programmable gate array (FPGA) implementation of an adaptive subsampling algorithm for region-of-interest (ROI) -based object tracking. By implementing the computationally intensive sections of this algorithm on an FPGA, I aim to offl oad computing resources from energy-ineffi cient graphics processing units (GPUs) and/or general-purpose central processing units (CPUs). I also present a working system executing this algorithm in near real-time latency implemented on a standalone embedded device. Secondly, I present a neural network-based pipeline to improve the performance of event-based cameras in non-ideal optical conditions. Event-based cameras or dynamic vision sensors (DVS) are bio-inspired sensors that measure logarithmic per-pixel brightness changes in a scene. Their advantages include high dynamic range, low latency and ultra-low power when compared to standard frame-based cameras. Several tasks have been proposed to take advantage of these novel sensors but they rely on perfectly calibrated optical lenses that are in-focus. In this work I propose a methodto reconstruct events captured with an out-of-focus event-camera so they can be fed into an intensity reconstruction task. The network is trained with a dataset generated by simulating defocus blur in sequences from object tracking datasets such as LaSOT and OTB100. I also test the generalization performance of this network in scenes captured with a DAVIS event-based sensor equipped with an out-of-focus lens.
Date Created
2022
Agent

Video Captioning with Commonsense Knowledge Anchors

168821-Thumbnail Image.png
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.
Date Created
2022
Agent

Towards Energy­-efficient Visual Navigation: Sensor Quantization and Event­-based Vision Pipelines

168739-Thumbnail Image.png
Description
Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented

Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work on evaluation of the efficiency of the image sensors. In this thesis, two methods are evaluated: adaptive image sensor quantization for traditional camera pipelines as well as new event­-based sensors for low­-power computer vision.The first contribution in this thesis is an evaluation of performing varying levels of sen­sor linear and logarithmic quantization with the task of visual simultaneous localization and mapping (SLAM). This unconventional method can provide efficiency benefits with a trade­ off between accuracy of the task and energy-­efficiency. A new sensor quantization method, gradient­-based quantization, is introduced to improve the accuracy of the task. This method only lowers the bit level of parts of the image that are less likely to be important in the SLAM algorithm since lower bit levels signify better energy­-efficiency, but worse task accuracy. The third contribution is an evaluation of the efficiency and accuracy of event­-based camera inten­sity representations for the task of optical flow. The results of performing a learning based optical flow are provided for each of five different reconstruction methods along with ablation studies. Lastly, the challenges of an event feature­-based SLAM system are presented with re­sults demonstrating the necessity for high quality and high­ resolution event data. The work in this thesis provides studies useful for examining trade­offs for an efficient visual navigation system with traditional and event vision sensors. The results of this thesis also provide multiple directions for future work.
Date Created
2022
Agent