A Distributional View on Multi-Objective Policy Optimization

05/15/2020
by   Abbas Abdolmaleki, et al.
0

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.

READ FULL TEXT

page 1

page 15

page 16

08/21/2019

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

We introduce a new algorithm for multi-objective reinforcement learning ...
04/11/2022

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

In real-world decision optimization, often multiple competing objectives...
03/02/2019

Lexicographically Ordered Multi-Objective Clustering

We introduce a rich model for multi-objective clustering with lexicograp...
10/25/2018

Model Selection using Multi-Objective Optimization

Choices in scientific research and management require balancing multiple...
04/21/2017

Multi-Objective Deep Q-Learning with Subsumption Architecture

In this work we present a method for using Deep Q-Networks (DQNs) in mul...
02/18/2020

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

Online real-time bidding (RTB) is known as a complex auction game where ...
10/03/2019

Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

In the multi-objective reinforcement learning (MORL) paradigm, the relat...