gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

04/11/2022
by   Johannes Dornheim, et al.
6

In real-world decision optimization, often multiple competing objectives must be taken into account. Following classical reinforcement learning, these objectives have to be combined into a single reward function. In contrast, multi-objective reinforcement learning (MORL) methods learn from vectors of per-objective rewards instead. In the case of multi-policy MORL, sets of decision policies for various preferences regarding the conflicting objectives are optimized. This is especially important when target preferences are not known during training or when preferences change dynamically during application. While it is, in general, straightforward to extend a single-objective reinforcement learning method for MORL based on linear scalarization, solutions that are reachable by these methods are limited to convex regions of the Pareto front. Non-linear MORL methods like Thresholded Lexicographic Ordering (TLO) are designed to overcome this limitation. Generalized MORL methods utilize function approximation to generalize across objective preferences and thereby implicitly learn multiple policies in a data-efficient manner, even for complex decision problems with high-dimensional or continuous state spaces. In this work, we propose generalized Thresholded Lexicographic Ordering (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL. We introduce a deep reinforcement learning realization of the algorithm and present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.

READ FULL TEXT

page 1

page 2

page 6

page 7

page 10

research
10/09/2016

Multi-Objective Deep Reinforcement Learning

We propose Deep Optimistic Linear Support Learning (DOL) to solve high-d...
research
03/15/2023

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Sequential decision making in the real world often requires finding a go...
research
02/21/2022

Inferring Lexicographically-Ordered Rewards from Preferences

Modeling the preferences of agents over a set of alternatives is a princ...
research
10/03/2019

Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

In the multi-objective reinforcement learning (MORL) paradigm, the relat...
research
05/15/2020

A Distributional View on Multi-Objective Policy Optimization

Many real-world problems require trading off multiple competing objectiv...
research
09/15/2022

Multi-Objective Policy Gradients with Topological Constraints

Multi-objective optimization models that encode ordered sequential const...
research
10/06/2021

From STL Rulebooks to Rewards

The automatic synthesis of neural-network controllers for autonomous age...

Please sign up or login with your details

Forgot password? Click here to reset