EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

06/20/2022
by   Thomas Carta, et al.
0

Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward. In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory. When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal. Our experimental study shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing exploration.

READ FULL TEXT
research
02/28/2018

Learning by Playing - Solving Sparse Reward Tasks from Scratch

We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm ...
research
06/24/2022

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

It has been a recent trend to leverage the power of supervised learning ...
research
03/02/2022

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Exploration versus exploitation dilemma is a significant problem in rein...
research
07/16/2023

Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is an interesting extension...
research
04/24/2021

Ask Explore: Grounded Question Answering for Curiosity-Driven Exploration

In many real-world scenarios where extrinsic rewards to the agent are ex...
research
04/24/2017

Reinforcement Learning Based Dynamic Selection of Auxiliary Objectives with Preserving of the Best Found Solution

Efficiency of single-objective optimization can be improved by introduci...
research
02/09/2023

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

High sample complexity has long been a challenge for RL. On the other ha...

Please sign up or login with your details

Forgot password? Click here to reset