Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow Models

07/01/2022
by   Conor F. Hayes, et al.
2

Many real-world problems contain multiple objectives and agents, where a trade-off exists between objectives. Key to solving such problems is to exploit sparse dependency structures that exist between agents. For example, in wind farm control a trade-off exists between maximising power and minimising stress on the systems components. Dependencies between turbines arise due to the wake effect. We model such sparse dependencies between agents as a multi-objective coordination graph (MO-CoG). In multi-objective reinforcement learning a utility function is typically used to model a users preferences over objectives, which may be unknown a priori. In such settings a set of optimal policies must be computed. Which policies are optimal depends on which optimality criterion applies. If the utility function of a user is derived from multiple executions of a policy, the scalarised expected returns (SER) must be optimised. If the utility of a user is derived from a single execution of a policy, the expected scalarised returns (ESR) criterion must be optimised. For example, wind farms are subjected to constraints and regulations that must be adhered to at all times, therefore the ESR criterion must be optimised. For MO-CoGs, the state-of-the-art algorithms can only compute a set of optimal policies for the SER criterion, leaving the ESR criterion understudied. To compute a set of optimal polices under the ESR criterion, also known as the ESR set, distributions over the returns must be maintained. Therefore, to compute a set of optimal policies under the ESR criterion for MO-CoGs, we present a novel distributional multi-objective variable elimination (DMOVE) algorithm. We evaluate DMOVE in realistic wind farm simulations. Given the returns in real-world wind farm settings are continuous, we utilise a model known as real-NVP to learn the continuous return distributions to calculate the ESR set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

In many real-world scenarios, the utility of a user is derived from the ...
research
02/01/2021

Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

In many risk-aware and multi-objective reinforcement learning settings, ...
research
05/09/2023

Distributional Multi-Objective Decision Making

For effective decision support in scenarios with conflicting objectives,...
research
11/23/2022

Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning

In many risk-aware and multi-objective reinforcement learning settings, ...
research
01/17/2020

A utility-based analysis of equilibria in multi-objective normal form games

In multi-objective multi-agent systems (MOMAS), agents explicitly consid...
research
11/22/2019

Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures

Multi-agent coordination is prevalent in many real-world applications. H...
research
02/18/2020

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

Online real-time bidding (RTB) is known as a complex auction game where ...

Please sign up or login with your details

Forgot password? Click here to reset