On the Expressivity of Markov Reward

11/01/2021
by   David Abel, et al.
15

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2023

On the Expressivity of Multidimensional Markov Reward

We consider the expressivity of Markov rewards in sequential decision ma...
research
02/05/2021

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...
research
09/19/2022

Understanding reinforcement learned crowds

Simulating trajectories of virtual crowds is a commonly encountered task...
research
08/24/2023

Predator-prey survival pressure is sufficient to evolve swarming behaviors

The comprehension of how local interactions arise in global collective b...
research
06/02/2022

Learning Soft Constraints From Constrained Expert Demonstrations

Inverse reinforcement learning (IRL) methods assume that the expert data...
research
10/29/2021

Learning to Be Cautious

A key challenge in the field of reinforcement learning is to develop age...
research
05/12/2021

Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning

Model based reinforcement learning (MBRL) uses an imperfect model of the...

Please sign up or login with your details

Forgot password? Click here to reset