Language Reward Modulation for Pretraining Reinforcement Learning

08/23/2023
by   Ademi Adeniji, et al.
0

Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose LAnguage Reward Modulated Pretraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a pretraining utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.

READ FULL TEXT

page 5

page 6

page 14

research
03/05/2019

Using Natural Language for Reward Shaping in Reinforcement Learning

Recent reinforcement learning (RL) approaches have shown strong performa...
research
02/13/2023

Guiding Pretraining in Reinforcement Learning with Large Language Models

Reinforcement learning algorithms typically struggle in the absence of a...
research
07/18/2023

Towards A Unified Agent with Foundation Models

Language Models and Vision Language Models have recently demonstrated un...
research
04/08/2022

Semantic Exploration from Language Abstractions and Pretrained Representations

Continuous first-person 3D environments pose unique exploration challeng...
research
09/30/2022

Improving Policy Learning via Language Dynamics Distillation

Recent work has shown that augmenting environments with language descrip...
research
05/23/2022

POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

The goal of Unsupervised Reinforcement Learning (URL) is to find a rewar...
research
05/25/2022

RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning

Prompting has shown impressive success in enabling large pretrained lang...

Please sign up or login with your details

Forgot password? Click here to reset