GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

by   Dylan Turpin, et al.

Tool use requires reasoning about the fit between an object's affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience, but current techniques rely on human labels or expert demonstrations to generate this data. In this paper, we describe a method that grounds affordances in physical interactions instead, thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data, which are then used to reveal affordance representations. Our framework, GIFT, operates in two phases: first, we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second, we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments, we show that GIFT can leverage a sparse keypoint representation to predict grasp and interaction points to accommodate multiple tasks, such as hooking, reaching, and hammering. GIFT outperforms baselines on all tasks and matches a human oracle on two of three tasks using novel tools.


page 2

page 3

page 6

page 7

page 9

page 10

page 11

page 13


Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight

Machine learning techniques have enabled robots to learn narrow, yet com...

KETO: Learning Keypoint Representations for Tool Manipulation

We aim to develop an algorithm for robots to manipulate novel objects as...

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

We introduce the dynamic grasp synthesis task: given an object with a kn...

Playful Interactions for Representation Learning

One of the key challenges in visual imitation learning is collecting lar...

Time-Varying Interaction Estimation Using Ensemble Methods

Directed information (DI) is a useful tool to explore time-directed inte...

How much "human-like" visual experience do current self-supervised learning algorithms need to achieve human-level object recognition?

This paper addresses a fundamental question: how good are our current se...

Please sign up or login with your details

Forgot password? Click here to reset