A Distributional View on Multi-Objective Policy Optimization

by   Abbas Abdolmaleki, et al.

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.


page 1

page 15

page 16


A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

We introduce a new algorithm for multi-objective reinforcement learning ...

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

In real-world decision optimization, often multiple competing objectives...

Lexicographically Ordered Multi-Objective Clustering

We introduce a rich model for multi-objective clustering with lexicograp...

Model Selection using Multi-Objective Optimization

Choices in scientific research and management require balancing multiple...

Multi-Objective Deep Q-Learning with Subsumption Architecture

In this work we present a method for using Deep Q-Networks (DQNs) in mul...

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

Online real-time bidding (RTB) is known as a complex auction game where ...

Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

In the multi-objective reinforcement learning (MORL) paradigm, the relat...