Lexicographic Multi-Objective Reinforcement Learning

12/28/2022
by   Joar Skalse, et al.
0

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.

READ FULL TEXT
research
09/17/2018

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Many reinforcement-learning researchers treat the reward function as a p...
research
05/28/2018

Reward Constrained Policy Optimization

Teaching agents to perform tasks using Reinforcement Learning is no easy...
research
03/16/2023

Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics

We introduce a Reinforcement Learning Psychotherapy AI Companion that ge...
research
01/17/2022

Chaining Value Functions for Off-Policy Learning

To accumulate knowledge and improve its policy of behaviour, a reinforce...
research
02/12/2019

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous...
research
11/16/2022

Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

Multi-objective reinforcement learning (MORL) is a relatively new field ...
research
08/29/2023

Policy composition in reinforcement learning via multi-objective policy optimization

We enable reinforcement learning agents to learn successful behavior pol...

Please sign up or login with your details

Forgot password? Click here to reset