My View is the Best View: Procedure Learning from Egocentric Videos

07/22/2022
by   Siddhant Bansal, et al.
0

Procedure learning involves identifying the key-steps and determining their logical order to perform a task. Existing approaches commonly use third-person videos for learning the procedure, making the manipulated object small in appearance and often occluded by the actor, leading to significant errors. In contrast, we observe that videos obtained from first-person (egocentric) wearable cameras provide an unobstructed and clear view of the action. However, procedure learning from egocentric videos is challenging because (a) the camera view undergoes extreme changes due to the wearer's head motion, and (b) the presence of unrelated frames due to the unconstrained nature of the videos. Due to this, current state-of-the-art methods' assumptions that the actions occur at approximately the same time and are of the same duration, do not hold. Instead, we propose to use the signal provided by the temporal correspondences between key-steps across videos. To this end, we present a novel self-supervised Correspond and Cut (CnC) framework for procedure learning. CnC identifies and utilizes the temporal correspondences between the key-steps across multiple videos to learn the procedure. Our experiments show that CnC outperforms the state-of-the-art on the benchmark ProceL and CrossTask datasets by 5.2 egocentric videos, we propose the EgoProceL dataset consisting of 62 hours of videos captured by 130 subjects performing 16 tasks. The source code and the dataset are available on the project page https://sid2697.github.io/egoprocel/.

READ FULL TEXT

page 2

page 6

page 12

page 15

research
09/05/2015

Co-interest Person Detection from Multiple Wearable Camera Videos

Wearable cameras, such as Google Glass and Go Pro, enable video data col...
research
03/29/2018

Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos

In a world in which cameras are becoming more and more pervasive, scenes...
research
01/06/2022

Enhancing Egocentric 3D Pose Estimation with Third Person Views

In this paper, we propose a novel approach to enhance the 3D body pose e...
research
07/16/2015

Multi-Face Tracking by Extended Bag-of-Tracklets in Egocentric Videos

Wearable cameras offer a hands-free way to record egocentric images of d...
research
05/27/2020

4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

We present a data-driven approach for 4D space-time visualization of dyn...
research
11/17/2021

Learning to Align Sequential Actions in the Wild

State-of-the-art methods for self-supervised sequential action alignment...
research
03/04/2017

Automated Top View Registration of Broadcast Football Videos

In this paper, we propose a novel method to register football broadcast ...

Please sign up or login with your details

Forgot password? Click here to reset