Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning

09/11/2019
by   Arpan Kusari, et al.
0

A common approach for defining a reward function for Multi-objective Reinforcement Learning (MORL) problems is the weighted sum of the multiple objectives. The weights are then treated as design parameters dependent on the expertise (and preference) of the person performing the learning, with the typical result that a new solution is required for any change in these settings. This paper investigates the relationship between the reward function and the optimal value function for MORL; specifically addressing the question of how to approximate the optimal value function well beyond the set of weights for which the optimization problem was actually solved, thereby avoiding the need to recompute for any particular choice. We prove that the value function transforms smoothly given a transformation of weights of the reward function (and thus a smooth interpolation in the policy space). A Gaussian process is used to obtain a smooth interpolation over the reward function weights of the optimal value function for three well-known examples: GridWorld, Objectworld and Pendulum. The results show that the interpolation can provide very robust values for sample states and action space in discrete and continuous domain problems. Significant advantages arise from utilizing this interpolation technique in the domain of autonomous vehicles: easy, instant adaptation of user preferences while driving and true randomization of obstacle vehicle behavior preferences during training.

READ FULL TEXT

page 1

page 5

page 6

research
09/17/2018

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Many reinforcement-learning researchers treat the reward function as a p...
research
03/13/2023

Kernel Density Bayesian Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) is a powerful framework to infer an...
research
01/01/2022

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Reinforcement learning (RL) has drawn increasing interests in recent yea...
research
10/09/2021

Active Altruism Learning and Information Sufficiency for Autonomous Driving

Safe interaction between vehicles requires the ability to choose actions...
research
03/23/2021

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...
research
10/02/2019

Relationship Explainable Multi-objective Optimization Via Vector Value Function Based Reinforcement Learning

Solving multi-objective optimization problems is important in various ap...
research
08/26/2021

A New Interpolation Approach and Corresponding Instance-Based Learning

Starting from finding approximate value of a function, introduces the me...

Please sign up or login with your details

Forgot password? Click here to reset