If MaxEnt RL is the Answer, What is the Question?

10/04/2019
by   Benjamin Eysenbach, et al.
0

Experimentally, it has been observed that humans and animals often make decisions that do not maximize their expected utility, but rather choose outcomes randomly, with probability proportional to expected utility. Probability matching, as this strategy is called, is equivalent to maximum entropy reinforcement learning (MaxEnt RL). However, MaxEnt RL does not optimize expected utility. In this paper, we formally show that MaxEnt RL does optimally solve certain classes of control problems with variability in the reward function. In particular, we show (1) that MaxEnt RL can be used to solve a certain class of POMDPs, and (2) that MaxEnt RL is equivalent to a two-player game where an adversary chooses the reward function. These results suggest a deeper connection between MaxEnt RL, robust control, and POMDPs, and provide insight for the types of problems for which we might expect MaxEnt RL to produce effective solutions. Specifically, our results suggest that domains with uncertainty in the task goal may be especially well-suited for MaxEnt RL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2021

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

Many potential applications of reinforcement learning (RL) require guara...
research
07/22/2023

DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

In machine learning for sequential decision-making, an algorithmic agent...
research
03/22/2019

Jet grooming through reinforcement learning

We introduce a novel implementation of a reinforcement learning (RL) alg...
research
04/20/2022

Reinforcement Learning with Intrinsic Affinity for Personalized Asset Management

The common purpose of applying reinforcement learning (RL) to asset mana...
research
10/17/2022

Teacher Forcing Recovers Reward Functions for Text Generation

Reinforcement learning (RL) has been widely used in text generation to a...
research
02/02/2023

Imitating careful experts to avoid catastrophic events

RL is increasingly being used to control robotic systems that interact c...
research
11/20/2018

Economics of disagreement -- financial intuition for the Rényi divergence

Lack of accurate intuition is often cited as a scientific challenge, esp...

Please sign up or login with your details

Forgot password? Click here to reset