H2O: Two Hands Manipulating Objects for First Person Interaction Recognition

04/22/2021
by   Taein Kwon, et al.
0

We present, for the first time, a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this end, we propose a method to create a unified dataset for egocentric 3D interaction recognition. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame. Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds. To the best of our knowledge, this is the first benchmark that enables the study of first-person actions with the use of the pose of both left and right hands manipulating objects and presents an unprecedented level of detail for egocentric 3D interaction recognition. We further propose the first method to predict interaction classes by estimating the 3D pose of two hands and the 6D pose of the manipulated objects, jointly from RGB images. Our method models both inter- and intra-dependencies between both hands and objects by learning the topology of a graph convolutional network that predicts interactions. We show that our method facilitated by this dataset establishes a strong baseline for joint hand-object pose estimation and achieves state-of-the-art accuracy for first person interaction recognition.

READ FULL TEXT

page 3

page 5

page 6

page 13

page 14

page 16

page 17

page 18

research
04/08/2017

First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations

In this work we study the use of 3D hand poses to recognize first-person...
research
12/09/2017

Single-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB Input

We propose a new efficient single-shot method for multi-person 3D pose e...
research
08/01/2022

MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network

Estimating 6D poses of objects is an essential computer vision task. How...
research
04/10/2019

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

We present a unified framework for understanding 3D hand and object inte...
research
09/26/2022

InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction

Humans constantly interact with daily objects to accomplish tasks. To un...
research
06/17/2018

The RBO Dataset of Articulated Objects and Interactions

We present a dataset with models of 14 articulated objects commonly foun...
research
08/16/2019

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments

In this work, we introduce the task of 3D object instance re-localizatio...

Please sign up or login with your details

Forgot password? Click here to reset