Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)

06/03/2019
by   Tushar Nagarajan, et al.
0

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements. We propose an approach to learn human-object interaction "hotspots" directly from video. Rather than treat affordances as a manually supervised semantic segmentation task, our approach learns about interactions by watching videos of real human behavior and anticipating afforded actions. Given a novel image or video, our model infers a spatial hotspot map indicating how an object would be manipulated in a potential interaction, even if the object is currently at rest. Through results with both first and third person video, we show the value of grounding affordances in real human-object interactions. Not only are our weakly supervised hotspots competitive with strongly supervised affordance methods, but they can also anticipate object interaction for novel object categories. Project page: http://vision.cs.utexas.edu/projects/interaction-hotspots/

READ FULL TEXT

page 1

page 3

page 4

research
12/11/2018

Grounded Human-Object Interaction Hotspots from Video

Learning how to interact with objects is an important step towards embod...
research
03/09/2023

Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors

Human-object interaction (HOI) detection aims to extract interacting hum...
research
09/10/2019

Reasoning About Human-Object Interactions Through Dual Attention Networks

Objects are entities we act upon, where the functionality of an object i...
research
05/08/2018

Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction

We study weakly-supervised video object grounding: given a video segment...
research
07/22/2022

Egocentric scene context for human-centric environment understanding from video

First-person video highlights a camera-wearer's activities in the contex...
research
06/16/2020

Learning About Objects by Learning to Interact with Them

Much of the remarkable progress in computer vision has been focused arou...
research
08/23/2023

CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images

We present a method for teaching machines to understand and model the un...

Please sign up or login with your details

Forgot password? Click here to reset