Meta-Learning for Multi-objective Reinforcement Learning

11/08/2018
by   Xi Chen, et al.
0

Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found, each optimizing a preference of the objectives. In this paper, we introduce a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs). In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions, in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom.

READ FULL TEXT

page 1

page 5

research
03/15/2023

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Sequential decision making in the real world often requires finding a go...
research
04/30/2023

Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

The goal of multi-objective reinforcement learning (MORL) is to learn po...
research
08/18/2023

Intrinsically Motivated Hierarchical Policy Learning in Multi-objective Markov Decision Processes

Multi-objective Markov decision processes are sequential decision-making...
research
09/26/2019

Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation

Solving multi-objective optimization problems is important in various ap...
research
04/17/2002

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

The problem of making sequential decisions in unknown probabilistic envi...
research
01/05/2017

Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

Existing multi-objective reinforcement learning (MORL) algorithms do not...
research
07/16/2020

Collision Avoidance Robotics Via Meta-Learning (CARML)

This paper presents an approach to exploring a multi-objective reinforce...

Please sign up or login with your details

Forgot password? Click here to reset