Multi-Camera Bird-Eye-View Occupancy Detection for Intelligent Transportation System

193840-Thumbnail Image.png
Description
3D perception poses a significant challenge in Intelligent Transportation Systems (ITS) due to occlusion and limited field of view. The necessity for real-time processing and alignment with existing traffic infrastructure compounds these limitations. To counter these issues, this work introduces

3D perception poses a significant challenge in Intelligent Transportation Systems (ITS) due to occlusion and limited field of view. The necessity for real-time processing and alignment with existing traffic infrastructure compounds these limitations. To counter these issues, this work introduces a novel multi-camera Bird-Eye View (BEV) occupancy detection framework. This approach leverages multi-camera setups to overcome occlusion and field-of-view limitations while employing BEV occupancy to simplify the 3D perception task, ensuring critical information is retained. A noble dataset for BEV Occupancy detection, encompassing diverse scenes and varying camera configurations, was created using the CARLA simulator. Subsequent extensive evaluation of various Multiview occupancy detection models showcased the critical roles of scene diversity and occupancy grid resolution in enhancing model performance. A structured framework that complements the generated data is proposed for data collection in the real world. The trained model is validated against real-world conditions to ensure its practical application, demonstrating the influence of robust dataset design in refining ITS perception systems. This contributes to significant advancements in traffic management, safety, and operational efficiency.
Date Created
2024
Agent

Roundabout Dilemma Zone Detection with Trajectory Forecasting

189366-Thumbnail Image.png
Description
In recent years, there has been a growing emphasis on developing automated systems to enhance traffic safety, particularly in the detection of dilemma zones (DZ) at intersections. This study focuses on the automated detection of DZs at roundabouts using trajectory

In recent years, there has been a growing emphasis on developing automated systems to enhance traffic safety, particularly in the detection of dilemma zones (DZ) at intersections. This study focuses on the automated detection of DZs at roundabouts using trajectory forecasting, presenting an advanced system with perception capabilities. The system utilizes a modular, graph-structured recurrent model that predicts the trajectories of various agents, accounting for agent dynamics and incorporating heterogeneous data such as semantic maps. This enables the system to facilitate traffic management decision-making and improve overall intersection safety. To assess the system's performance, a real-world dataset of traffic roundabout intersections was employed. The experimental results demonstrate that our Superpowered Trajectron++ system exhibits high accuracy in detecting DZ events, with a false positive rate of approximately 10%. Furthermore, the system has the remarkable ability to anticipate and identify dilemma events before they occur, enabling it to provide timely instructions to vehicles. These instructions serve as guidance, determining whether vehicles should come to a halt or continue moving through the intersection, thereby enhancing safety and minimizing potential conflicts. In summary, the development of automated systems for detecting DZs represents an important advancement in traffic safety. The Superpowered Trajectron++ system, with its trajectory forecasting capabilities and incorporation of diverse data sources, showcases improved accuracy in identifying DZ events and can effectively guide vehicles in making informed decisions at roundabout intersections.
Date Created
2023
Agent

SKOPE3D: A Synthetic Keypoint Perception 3D Dataset for Vehicle Pose Estimation

187323-Thumbnail Image.png
Description
Intelligent transportation systems (ITS) are a boon to modern-day road infrastructure. It supports traffic monitoring, road safety improvement, congestion reduction, and other traffic management tasks. For an ITS, roadside perception capability with cameras, LIDAR, and RADAR sensors is the key.

Intelligent transportation systems (ITS) are a boon to modern-day road infrastructure. It supports traffic monitoring, road safety improvement, congestion reduction, and other traffic management tasks. For an ITS, roadside perception capability with cameras, LIDAR, and RADAR sensors is the key. Among various roadside perception technologies, vehicle keypoint detection is a fundamental problem, which involves detecting and localizing specific points on a vehicle, such as the headlights, wheels, taillights, etc. These keypoints can be used to track the movement of the vehicles and their orientation. However, there are several challenges in vehicle keypoint detection, such as the variation in vehicle models and shapes, the presence of occlusion in traffic scenarios, the influence of weather and changing lighting conditions, etc. More importantly, existing traffic perception datasets for keypoint detection are mainly limited to the frontal view with sensors mounted on the ego vehicles. These datasets are not designed for traffic monitoring cameras that are mounted on roadside poles. There’s a huge advantage of capturing the data from roadside cameras as they can cover a much larger distance with a wider field of view in many different traffic scenes, but such a dataset is usually expensive to construct. In this research, I present SKOPE3D: Synthetic Keypoint Perception 3D dataset, a one-of-its-kind synthetic perception dataset generated using a simulator from the roadside perspective. It comes with 2D bounding boxes, 3D bounding boxes, tracking IDs, and 33 keypoints for each vehicle in the scene. The dataset consists of 25K frames spanning over 28 scenes with over 150K vehicles and 4.9M keypoints. A baseline keypoint RCNN model is trained on the dataset and is thoroughly evaluated on the test set. The experiments show the capability of the synthetic dataset and knowledge transferability between synthetic and real-world data.
Date Created
2023
Agent

AvaCAR

171810-Thumbnail Image.png
Description
For a system of autonomous vehicles functioning together in a traffic scene, 3Dunderstanding of participants in the field of view or surrounding is very essential for assessing the safety operation of the involved. This problem can be decomposed into online pose and

For a system of autonomous vehicles functioning together in a traffic scene, 3Dunderstanding of participants in the field of view or surrounding is very essential for assessing the safety operation of the involved. This problem can be decomposed into online pose and shape estimation, which has been a core research area of computer vision for over a decade now. This work is an add-on to support and improve the joint estimate of the pose and shape of vehicles from monocular cameras. The objective of jointly estimating the vehicle pose and shape online is enabled by what is called an offline reconstruction pipeline. In the offline reconstruction step, an approach to obtain the vehicle 3D shape with keypoints labeled is formulated. This work proposes a multi-view reconstruction pipeline using images and masks which can create an approximate shape of vehicles and can be used as a shape prior. Then a 3D model-fitting optimization approach to refine the shape prior using high quality computer-aided design (CAD) models of vehicles is developed. A dataset of such 3D vehicles with 20 keypoints annotated is prepared and call it the AvaCAR dataset. The AvaCAR dataset can be used to estimate the vehicle shape and pose, without having the need to collect significant amounts of data needed for adequate training of a neural network. The online reconstruction can use this synthesis dataset to generate novel viewpoints and simultaneously train a neural network for pose and shape estimation. Most methods in the current literature using deep neural networks, that are trained to estimate pose of the object from a single image, are inherently biased to the viewpoint of the images used. This approach aims at addressing these existing limitations in the current method by delivering the online estimation a shape prior which can generate novel views to account for the bias due to viewpoint. The dataset is provided with ground truth extrinsic parameters and the compact vector based shape representations which along with the multi-view dataset can be used to efficiently trained neural networks for vehicle pose and shape estimation. The vehicles in this library are evaluated with some standard metrics to assure they are capable of aiding online estimation and model based tracking.
Date Created
2022
Agent

Vehicle Re-identification Using a Multi-View Vehicle Dataset

168842-Thumbnail Image.png
Description
There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now;

There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people walking around with smartphones in their pockets that come with built-in cameras. With this sudden increase in the accessibility of cameras, most of the data that is getting captured through these devices is ending up on the internet. Researchers soon took leverage of this data by creating large-scale datasets. However, generating a dataset – let alone a large-scale one – requires a lot of man-hours. This work presents an algorithm that makes use of optical flow and feature matching, along with utilizing localization outputs from a Mask R-CNN, to generate large-scale vehicle datasets without much human supervision. Additionally, this work proposes a novel multi-view vehicle dataset (MVVdb) of 500 vehicles which is also generated using the aforementioned algorithm.There are various research problems in computer vision that can leverage a multi-view dataset, e.g., 3D pose estimation, and 3D object detection. On the other hand, a multi-view vehicle dataset can be used for a 2D image to 3D shape prediction, generation of 3D vehicle models, and even a more robust vehicle make and model recognition. In this work, a ResNet is trained on the multi-view vehicle dataset to perform vehicle re-identification, which is fundamentally similar to a vehicle make and recognition problem – also showcasing the usability of the MVVdb dataset.
Date Created
2022
Agent

3D In-Air-Handwriting based User Login and Identity Input Method

161976-Thumbnail Image.png
Description
Applications over a gesture-based human-computer interface (HCI) require a new user login method with gestures because it does not have traditional input devices. For example, a user may be asked to verify the identity to unlock a device in a

Applications over a gesture-based human-computer interface (HCI) require a new user login method with gestures because it does not have traditional input devices. For example, a user may be asked to verify the identity to unlock a device in a mobile or wearable platform, or sign in to a virtual site over a Virtual Reality (VR) or Augmented Reality (AR) headset, where no physical keyboard or touchscreen is available. This dissertation presents a unified user login framework and an identity input method using 3D In-Air-Handwriting (IAHW), where a user can log in to a virtual site by writing a passcode in the air very fast like a signature. The presented research contains multiple tasks that span motion signal modeling, user authentication, user identification, template protection, and a thorough evaluation in both security and usability. The results of this research show around 0.1% to 3% Equal Error Rate (EER) in user authentication in different conditions as well as 93% accuracy in user identification, on a dataset with over 100 users and two types of gesture input devices. Besides, current research in this area is severely limited by the availability of the gesture input device, datasets, and software tools. This study provides an infrastructure for IAHW research with an open-source library and open datasets of more than 100K IAHW hand movement signals. Additionally, the proposed user identity input method can be extended to a general word input method for both English and Chinese using limited training data. Hence, this dissertation can help the research community in both cybersecurity and HCI to explore IAHW as a new direction, and potentially pave the way to practical adoption of such technologies in the future.
Date Created
2021
Agent