Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning

03/16/2023
by   Junqi Qian, et al.
0

When applying reinforcement learning (RL) to a new problem, reward engineering is a necessary, but often difficult and error-prone task a system designer has to face. To avoid this step, we propose LR4GPM, a novel (deep) RL method that can optimize a global performance metric, which is supposed to be available as part of the problem description. LR4GPM alternates between two phases: (1) learning a (possibly vector) reward function used to fit the performance metric, and (2) training a policy to optimize an approximation of this performance metric based on the learned rewards. Such RL training is not straightforward since both the reward function and the policy are trained using non-stationary data. To overcome this issue, we propose several training tricks. We demonstrate the efficiency of LR4GPM on several domains. Notably, LR4GPM outperforms the winner of a recent autonomous driving competition organized at DAI'2020.

READ FULL TEXT
research
06/05/2023

Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous Driving

Reinforcement learning (RL) is an effective approach to motion planning ...
research
08/18/2020

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

As the operations of autonomous systems generally affect simultaneously ...
research
03/22/2019

Jet grooming through reinforcement learning

We introduce a novel implementation of a reinforcement learning (RL) alg...
research
02/12/2021

Disturbing Reinforcement Learning Agents with Corrupted Rewards

Reinforcement Learning (RL) algorithms have led to recent successes in s...
research
05/10/2019

Autonomous Management of Energy-Harvesting IoT Nodes Using Deep Reinforcement Learning

Reinforcement learning (RL) is capable of managing wireless, energy-harv...
research
02/22/2021

Exploring Supervised and Unsupervised Rewards in Machine Translation

Reinforcement Learning (RL) is a powerful framework to address the discr...
research
03/02/2023

T-Cell Receptor Optimization with Reinforcement Learning and Mutation Policies for Precesion Immunotherapy

T cells monitor the health status of cells by identifying foreign peptid...

Please sign up or login with your details

Forgot password? Click here to reset