Grounding 3D Object Affordance from 2D Interactions in Images

03/18/2023
by   Yuhang Yang, et al.
0

Grounding 3D object affordance seeks to locate objects' ”action possibilities” regions in the 3D space, which serves as a link between perception and operation for embodied agents. Existing studies primarily focus on connecting visual affordances with geometry structures, e.g. relying on annotations to declare interactive regions of interest on the object and establishing a mapping between the regions and affordances. However, the essence of learning object affordance is to understand how to use it, and the manner that detaches interactions is limited in generalization. Normally, humans possess the ability to perceive object affordances in the physical world through demonstration images or videos. Motivated by this, we introduce a novel task setting: grounding 3D object affordance from 2D interactions in images, which faces the challenge of anticipating affordance through interactions of different sources. To address this problem, we devise a novel Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources and models the interactive contexts for 3D object affordance grounding. Besides, we collect a Point-Image Affordance Dataset (PIAD) to support the proposed task. Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method. The project is available at https://github.com/yyvhang/IAGNet.

READ FULL TEXT

page 7

page 8

page 14

page 22

page 23

page 24

page 25

research
08/12/2021

Learning Visual Affordance Grounding from Demonstration Videos

Visual affordance grounding aims to segment all possible interaction reg...
research
09/06/2023

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual con...
research
05/05/2023

Interactive Acquisition of Fine-grained Visual Concepts by Exploiting Semantics of Generic Characterizations in Discourse

Interactive Task Learning (ITL) concerns learning about unforeseen domai...
research
03/16/2023

LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

Humans excel at acquiring knowledge through observation. For example, we...
research
03/14/2021

Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

Grounding referring expressions in RGBD image has been an emerging field...
research
03/13/2021

OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding

To effectively apply robots in working environments and assist humans, i...
research
11/28/2022

G^3: Geolocation via Guidebook Grounding

We demonstrate how language can improve geolocation: the task of predict...

Please sign up or login with your details

Forgot password? Click here to reset