COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

10/04/2022
by   Boxiao Pan, et al.
9

To produce safe human motions, assistive wearable exoskeletons must be equipped with a perception system that enables anticipating potential collisions from egocentric observations. However, previous approaches to exoskeleton perception greatly simplify the problem to specific types of environments, limiting their scalability. In this paper, we propose the challenging and novel problem of predicting human-scene collisions for diverse environments from multi-view egocentric RGB videos captured from an exoskeleton. By classifying which body joints will collide with the environment and predicting a collision region heatmap that localizes potential collisions in the environment, we aim to develop an exoskeleton perception system that generalizes to complex real-world scenes and provides actionable outputs for downstream control. We propose COPILOT, a video transformer-based model that performs both collision prediction and localization simultaneously, leveraging multi-view video inputs via a proposed joint space-time-viewpoint attention operation. To train and evaluate the model, we build a synthetic data generation framework to simulate virtual humans moving in photo-realistic 3D environments. This framework is then used to establish a dataset consisting of 8.6M egocentric RGBD frames to enable future work on the problem. Extensive experiments suggest that our model achieves promising performance and generalizes to unseen scenes as well as real world. We apply COPILOT to a downstream collision avoidance task, and successfully reduce collision cases by 29

READ FULL TEXT

page 1

page 2

page 4

page 6

research
03/20/2020

Visual Navigation Among Humans with Optimal Control as a Supervisor

Real world navigation requires robots to operate in unfamiliar, dynamic ...
research
09/13/2023

Towards Connecting Control to Perception: High-Performance Whole-Body Collision Avoidance Using Control-Compatible Obstacles

One of the most important aspects of autonomous systems is safety. This ...
research
12/23/2021

HSPACE: Synthetic Parametric Humans Animated in Complex Environments

Advances in the state of the art for 3d human sensing are currently limi...
research
09/15/2021

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

In this paper, we aim at synthesizing a free-viewpoint video of an arbit...
research
04/08/2016

STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling

We propose a novel superpixel-based multi-view convolutional neural netw...
research
09/24/2021

Multi-View Video-Based 3D Hand Pose Estimation

Hand pose estimation (HPE) can be used for a variety of human-computer i...
research
04/19/2023

Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation

Universal moving object segmentation aims to provide a general model for...

Please sign up or login with your details

Forgot password? Click here to reset