Actively Learning Costly Reward Functions for Reinforcement Learning

11/23/2022
by   Andre Eberhard, et al.
0

Transfer of recent advances in deep reinforcement learning to real-world applications is hindered by high data demands and thus low efficiency and scalability. Through independent improvements of components such as replay buffers or more stable learning algorithms, and through massively distributed systems, training time could be reduced from several days to several hours for standard benchmark tasks. However, while rewards in simulated environments are well-defined and easy to compute, reward evaluation becomes the bottleneck in many real-world environments, e.g., in molecular optimization tasks, where computationally demanding simulations or even experiments are required to evaluate states and to quantify rewards. Therefore, training might become prohibitively expensive without an extensive amount of computational resources and time. We propose to alleviate this problem by replacing costly ground-truth rewards with rewards modeled by neural networks, counteracting non-stationarity of state and reward distributions during training with an active learning component. We demonstrate that using our proposed ACRL method (Actively learning Costly rewards for Reinforcement Learning), it is possible to train agents in complex real-world environments orders of magnitudes faster. By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions to real-world optimization problems in chemistry, materials science and engineering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2017

Experience enrichment based task independent reward model

For most reinforcement learning approaches, the learning is performed by...
research
10/21/2019

Dealing with Sparse Rewards in Reinforcement Learning

Successfully navigating a complex environment to obtain a desired outcom...
research
10/07/2021

Robotic Lever Manipulation using Hindsight Experience Replay and Shapley Additive Explanations

This paper deals with robotic lever control using Explainable Deep Reinf...
research
01/01/2020

Reinforcement Learning with Goal-Distance Gradient

Reinforcement learning usually uses the feedback rewards of environmenta...
research
07/16/2021

Decentralized Multi-Agent Reinforcement Learning for Task Offloading Under Uncertainty

Multi-Agent Reinforcement Learning (MARL) is a challenging subarea of Re...
research
04/13/2021

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...
research
05/20/2022

Learning Dense Reward with Temporal Variant Self-Supervision

Rewards play an essential role in reinforcement learning. In contrast to...

Please sign up or login with your details

Forgot password? Click here to reset