Temporal Video-Language Alignment Network for Reward Shaping in Reinforcement Learning

02/08/2023
by   Ziyuan Cao, et al.
0

Designing appropriate reward functions for Reinforcement Learning (RL) approaches has been a significant problem, especially for complex environments such as Atari games. Utilizing natural language instructions to provide intermediate rewards to RL agents in a process known as reward shaping can help the agent in reaching the goal state faster. In this work, we propose a natural language-based reward shaping approach that maps trajectories from the Montezuma's Revenge game environment to corresponding natural language instructions using an extension of the LanguagE-Action Reward Network (LEARN) framework. These trajectory-language mappings are further used to generate intermediate rewards which are integrated into reward functions that can be utilized to learn an optimal policy for any standard RL algorithms. For a set of 15 tasks from Atari's Montezuma's Revenge game, the Ext-LEARN approach leads to the successful completion of tasks more often on average than the reward shaping approach that uses the LEARN framework and performs even better than the reward shaping framework without natural language-based rewards.

READ FULL TEXT
research
03/05/2019

Using Natural Language for Reward Shaping in Reinforcement Learning

Recent reinforcement learning (RL) approaches have shown strong performa...
research
04/21/2020

SIBRE: Self Improvement Based REwards for Reinforcement Learning

We propose a generic reward shaping approach for improving rate of conve...
research
11/08/2022

Learning to Follow Instructions in Text-Based Games

Text-based games present a unique class of sequential decision making pr...
research
02/27/2023

Reward Design with Language Models

Reward design in reinforcement learning (RL) is challenging since specif...
research
05/26/2023

A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following Agents

Teaching agents to follow complex written instructions has been an impor...
research
05/02/2021

InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

The temporal Credit Assignment Problem (CAP) is a well-known and challen...
research
11/01/2022

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

The adoption of pre-trained language models to generate action plans for...

Please sign up or login with your details

Forgot password? Click here to reset