Grounded Human-Object Interaction Hotspots from Video

12/11/2018
by   Tushar Nagarajan, et al.
6

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements. We propose an approach to learn human-object interaction "hotspots" directly from video. Rather than treat affordances as a manually supervised semantic segmentation task, our approach learns about interactions by watching videos of real human behavior and recognizing afforded actions. Given a novel image or video, our model infers a spatial hotspot map indicating how an object would be manipulated in a potential interaction -- even if the object is currently at rest. Through results with both first and third person video, we show the value of grounding affordance maps in real human-object interactions. Not only are our weakly supervised grounded hotspots competitive with strongly supervised affordance methods, but they can also anticipate object function for novel objects and enhance object recognition.

READ FULL TEXT

page 1

page 4

page 7

page 8

page 12

page 13

page 14

research
06/03/2019

Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)

Learning how to interact with objects is an important step towards embod...
research
05/08/2018

Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction

We study weakly-supervised video object grounding: given a video segment...
research
10/07/2021

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

We introduce the task of weakly supervised learning for detecting human ...
research
06/22/2020

Understanding Object Affordances Through Verb Usage Patterns

In order to interact with objects in our environment, we rely on an unde...
research
11/16/2016

Unsupervised Learning of Important Objects from First-Person Videos

A first-person camera, placed at a person's head, captures, which object...
research
09/10/2019

Reasoning About Human-Object Interactions Through Dual Attention Networks

Objects are entities we act upon, where the functionality of an object i...
research
03/03/2021

Learning Asynchronous and Sparse Human-Object Interaction in Videos

Human activities can be learned from video. With effective modeling it i...

Please sign up or login with your details

Forgot password? Click here to reset