Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

01/05/2017
by   Andrew Critch, et al.
0

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine's policy will prioritize each player's interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player's own beliefs in evaluating how well an action will serve that player's utility function, and (2) shift the relative priority it assigns to each player's expected utilities over time, by a factor proportional to how well that player's beliefs predict the machine's inputs. Observation (2) represents a substantial divergence from naïve linear utility aggregation (as in Harsanyi's utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2017

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

It is often argued that an agent making decisions on behalf of two or mo...
research
09/07/2021

Contest Design with Threshold Objectives

We study contests where the designer's objective is an extension of the ...
research
11/08/2018

Meta-Learning for Multi-objective Reinforcement Learning

Multi-objective reinforcement learning (MORL) is the generalization of s...
research
09/08/2022

Valuing Players Over Time

In soccer (or association football), players quickly go from heroes to z...
research
09/20/2022

Mutual knowledge of rationality and correct beliefs in n-person games: An impossibility theorem

There are two well-known sufficient conditions for Nash equilibrium: com...
research
06/22/2021

Are the Players in an Interactive Belief Model Meta-certain of the Model Itself?

In an interactive belief model, are the players "commonly meta-certain" ...
research
07/11/2023

Sequential Language-based Decisions

In earlier work, we introduced the framework of language-based decisions...

Please sign up or login with your details

Forgot password? Click here to reset