Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach

04/27/2023
by   Junlin Lu, et al.
0

Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives. This research proposes a Dynamic Weight-based Preference Inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems, based on observed behavior trajectories in the environment. The proposed method is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering. The performance of the proposed DWPI approach is compared to two existing preference inference methods from the literature, and empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. The Dynamic Weight-based Preference Inference algorithm also maintains its performance when inferring preferences for sub-optimal behavior demonstrations. In addition to its impressive performance, the Dynamic Weight-based Preference Inference algorithm does not require any interactions during training with the agent whose preferences are inferred, all that is required is a trajectory of observed behavior.

READ FULL TEXT

page 7

page 12

research
04/27/2023

Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making

It is challenging to quantify numerical preferences for different object...
research
05/10/2021

Multi-Objective Controller Synthesis with Uncertain Human Preferences

Multi-objective controller synthesis concerns the problem of computing a...
research
12/18/2015

Learning the Preferences of Ignorant, Inconsistent Agents

An important use of machine learning is to learn what people value. What...
research
08/08/2022

Improving performance in multi-objective decision-making in Bottles environments with soft maximin approaches

Balancing multiple competing and conflicting objectives is an essential ...
research
02/21/2023

Inferring Implicit Trait Preferences for Task Allocation in Heterogeneous Teams

Task allocation in heterogeneous multi-agent teams often requires reason...
research
12/20/2018

Decentralized Decision-Making Over Multi-Task Networks

In important applications involving multi-task networks with multiple ob...
research
10/27/2011

User preference extraction using dynamic query sliders in conjunction with UPS-EMO algorithm

One drawback of evolutionary multiobjective optimization algorithms (EMO...

Please sign up or login with your details

Forgot password? Click here to reset