O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning

by   Kaichun Mo, et al.

Contrary to the vast literature in modeling, perceiving, and understanding agent-object (e.g., human-object, hand-object, robot-object) interaction in computer vision and robotics, very few past works have studied the task of object-object interaction, which also plays an important role in robotic manipulation and planning tasks. There is a rich space of object-object interaction scenarios in our daily life, such as placing an object on a messy tabletop, fitting an object inside a drawer, pushing an object using a tool, etc. In this paper, we propose a unified affordance learning framework to learn object-object interaction for various tasks. By constructing four object-object interaction task environments using physical simulation (SAPIEN) and thousands of ShapeNet models with rich geometric diversity, we are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations. At the core of technical contribution, we propose an object-kernel point convolution network to reason about detailed interaction between two objects. Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach. Please refer to the project webpage for code, data, video, and more materials: https://cs.stanford.edu/ kaichun/o2oafford


page 2

page 4

page 7

page 8

page 13

page 15

page 16

page 17


H2O: A Benchmark for Visual Human-human Object Handover Analysis

Object handover is a common human collaboration behavior that attracts a...

ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills

Learning generalizable manipulation skills is central for robots to achi...

VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects

Perceiving and manipulating 3D articulated objects (e.g., cabinets, door...

DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation

It is essential yet challenging for future home-assistant robots to unde...

Detailed 2D-3D Joint Representation for Human-Object Interaction

Human-Object Interaction (HOI) detection lies at the core of action unde...

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

Important high-level vision tasks such as human-object interaction, imag...

Low-Cost Scene Modeling using a Density Function Improves Segmentation Performance

We propose a low cost and effective way to combine a free simulation sof...