Extrapolation in Gridworld Markov-Decision Processes

04/14/2020
by   Eugene Charniak, et al.
11

Extrapolation in reinforcement learning is the ability to generalize at test time given states that could never have occurred at training time. Here we consider four factors that lead to improved extrapolation in a simple Gridworld environment: (a) avoiding maximum Q-value (or other deterministic methods) for action choice at test time, (b) ego-centric representation of the Gridworld, (c) building rotational and mirror symmetry into the learning mechanism using rotational and mirror invariant convolution (rather than standard translation-invariant convolution), and (d) adding a maximum entropy term to the loss function to encourage equally good actions to be chosen equally often.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2013

Fast Value Iteration for Goal-Directed Markov Decision Processes

Planning problems where effects of actions are non-deterministic can be ...
research
02/24/2017

Changing Model Behavior at Test-Time Using Reinforcement Learning

Machine learning models are often used at test-time subject to constrain...
research
08/22/2019

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Off-policy evaluation (OPE) in reinforcement learning allows one to eval...
research
11/29/2019

Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation

This paper proposes a formal approach to learning and planning for agent...
research
12/03/2015

Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions

Many real-world problems come with action spaces represented as feature ...
research
02/22/2021

Action Redundancy in Reinforcement Learning

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning p...

Please sign up or login with your details

Forgot password? Click here to reset