First Person Action-Object Detection with EgoNet

03/15/2016
by   Gedas Bertasius, et al.
0

Unlike traditional third-person cameras mounted on robots, a first-person camera, captures a person's visual sensorimotor object interactions from up close. In this paper, we study the tight interplay between our momentary visual attention and motor action with objects from a first-person camera. We propose a concept of action-objects---the objects that capture person's conscious visual (watching a TV) or tactile (taking a cup) interactions. Action-objects may be task-dependent but since many tasks share common person-object spatial configurations, action-objects exhibit a characteristic 3D spatial distance and orientation with respect to the person. We design a predictive model that detects action-objects using EgoNet, a joint two-stream network that holistically integrates visual appearance (RGB) and 3D spatial layout (depth and height) cues to predict per-pixel likelihood of action-objects. Our network also incorporates a first-person coordinate embedding, which is designed to learn a spatial distribution of the action-objects in the first-person data. We demonstrate EgoNet's predictive power, by showing that it consistently outperforms previous baseline approaches. Furthermore, EgoNet also exhibits a strong generalization ability, i.e., it predicts semantically meaningful objects in novel first-person datasets. Our method's ability to effectively detect action-objects could be used to improve robots' understanding of human-object interactions.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 7

page 8

research
11/16/2016

Unsupervised Learning of Important Objects from First-Person Videos

A first-person camera, placed at a person's head, captures, which object...
research
04/01/2020

Spatio-Temporal Action Detection with Multi-Object Interaction

Spatio-temporal action detection in videos requires localizing the actio...
research
04/17/2016

Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance

Understanding images with people often entails understanding their inter...
research
10/09/2016

Egocentric Height Estimation

Egocentric, or first-person vision which became popular in recent years ...
research
01/23/2019

AlteregoNets: a way to human augmentation

A person dependent network, called an AlterEgo net, is proposed for deve...
research
07/22/2023

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

As a fundamental aspect of human life, two-person interactions contain m...
research
04/07/2015

Ego-Object Discovery

Lifelogging devices are spreading faster everyday. This growth can repre...

Please sign up or login with your details

Forgot password? Click here to reset