Egocentric Activity Recognition and Localization on a 3D Map

05/20/2021
by   Miao Liu, et al.
0

Given a video captured from a first person perspective and recorded in a familiar environment, can we recognize what the person is doing and identify where the action occurs in the 3D space? We address this challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos. To this end, we propose a novel deep probabilistic model. Our model takes the inputs of a Hierarchical Volumetric Representation (HVR) of the environment and an egocentric video, infers the 3D action location as a latent variable, and recognizes the action based on the video and contextual cues surrounding its potential locations. To evaluate our model, we conduct extensive experiments on a newly collected egocentric video dataset, in which both human naturalistic actions and photo-realistic 3D environment reconstructions are captured. Our method demonstrates strong results on both action recognition and 3D action localization across seen and unseen environments. We believe our work points to an exciting research direction in the intersection of egocentric vision, and 3D scene understanding.

READ FULL TEXT

page 1

page 4

page 8

page 13

page 14

page 15

research
05/31/2020

In the Eye of the Beholder: Gaze and Actions in First Person Video

We address the task of jointly determining what a person is doing and wh...
research
06/14/2023

What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

We propose and address a new generalisation problem: can a model trained...
research
01/14/2020

EGO-TOPO: Environment Affordances from Egocentric Video

First-person video naturally brings the use of a physical environment to...
research
04/07/2016

Trajectory Aligned Features For First Person Action Recognition

Egocentric videos are characterised by their ability to have the first p...
research
02/10/2016

DAP3D-Net: Where, What and How Actions Occur in Videos?

Action parsing in videos with complex scenes is an interesting but chall...
research
11/05/2021

Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing

Part-level Action Parsing aims at part state parsing for boosting action...
research
11/30/2016

Sequential Person Recognition in Photo Albums with a Recurrent Network

Recognizing the identities of people in everyday photos is still a very ...

Please sign up or login with your details

Forgot password? Click here to reset