Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets

07/07/2022
by   Dimitrios C. Gklezakos, et al.
0

Inspired by Gibson's notion of object affordances in human vision, we ask the question: how can an agent learn to predict an entire action policy for a novel object or environment given only a single glimpse? To tackle this problem, we introduce the concept of Universal Policy Functions (UPFs) which are state-to-action mappings that generalize not only to new goals but most importantly to novel, unseen environments. Specifically, we consider the problem of efficiently learning such policies for agents with limited computational and communication capacity, constraints that are frequently encountered in edge devices. We propose the Hyper-Universal Policy Approximator (HUPA), a hypernetwork-based model to generate small task- and environment-conditional policy networks from a single image, with good generalization properties. Our results show that HUPAs significantly outperform an embedding-based alternative for generated policies that are size-constrained. Although this work is restricted to a simple map-based navigation task, future work includes applying the principles behind HUPAs to learning more general affordances for objects and environments.

READ FULL TEXT

page 2

page 4

research
09/13/2021

UMPNet: Universal Manipulation Policy Network for Articulated Objects

We introduce the Universal Manipulation Policy Network (UMPNet) – a sing...
research
07/26/2019

Environment Probing Interaction Policies

A key challenge in reinforcement learning (RL) is environment generaliza...
research
06/10/2021

Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions

In multi-agent reinforcement learning, the inherent non-stationarity of ...
research
11/16/2017

Hindsight policy gradients

Goal-conditional policies allow reinforcement learning agents to pursue ...
research
09/28/2021

Not Only Domain Randomization: Universal Policy with Embedding System Identification

Domain randomization (DR) cannot provide optimal policies for adapting t...
research
08/18/2023

Multi-Level Compositional Reasoning for Interactive Instruction Following

Robotic agents performing domestic chores by natural language directives...
research
06/18/2021

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environme...

Please sign up or login with your details

Forgot password? Click here to reset