Visual Semantic Planning using Deep Successor Representations

05/23/2017
by   Yuke Zhu, et al.
0

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects and their affordances, as well as actions and their preconditions and effects. We propose learning these through interacting with a visual and dynamic environment. Our proposed solution involves bootstrapping reinforcement learning with imitation learning. To ensure cross task generalization, we develop a deep predictive model based on successor representations. Our experimental results show near optimal results across a wide range of tasks in the challenging THOR environment.

READ FULL TEXT

page 1

page 3

page 8

page 11

research
12/25/2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial aspect of mobile intelligent agents is their ability to integr...
research
12/04/2019

Visual Reaction: Learning to Play Catch with Your Drone

In this paper we address the problem of visual reaction: the task of int...
research
07/21/2020

PackIt: A Virtual Environment for Geometric Planning

The ability to jointly understand the geometry of objects and plan actio...
research
08/14/2023

Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

Accomplishing household tasks requires to plan step-by-step actions cons...
research
12/17/2019

Design and Implementation of Linked Planning Domain Definition Language

Planning is a critical component of any artificial intelligence system t...
research
04/29/2015

Anticipating Visual Representations from Unlabeled Video

Anticipating actions and objects before they start or appear is a diffic...
research
08/14/2023

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

VLN-CE is a recently released embodied task, where AI agents need to nav...

Please sign up or login with your details

Forgot password? Click here to reset