Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

04/18/2023
by   Alain Andres, et al.
0

One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.

READ FULL TEXT

page 4

page 7

research
03/30/2023

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent...
research
06/24/2022

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

It has been a recent trend to leverage the power of supervised learning ...
research
05/06/2023

Explaining RL Decisions with Trajectories

Explanation is a key component for the adoption of reinforcement learnin...
research
10/19/2022

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

Generalization in Reinforcement Learning (RL) aims to learn an agent dur...
research
10/21/2021

Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

End-to-end learning robotic manipulation with high data efficiency is on...
research
10/12/2022

Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories

Natural agents can effectively learn from multiple data sources that dif...
research
11/28/2018

Trajectory-based Learning for Ball-in-Maze Games

Deep Reinforcement Learning has shown tremendous success in solving seve...

Please sign up or login with your details

Forgot password? Click here to reset