Robust Predictable Control

09/07/2021
by   Benjamin Eysenbach, et al.
0

Many of the challenges facing today's reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency are closely related to compression. Prior work has convincingly argued why minimizing information is useful in the supervised learning setting, but standard RL algorithms lack an explicit mechanism for compression. The RL setting is unique because (1) its sequential nature allows an agent to use past information to avoid looking at future observations and (2) the agent can optimize its behavior to prefer states where decision making requires few bits. We take advantage of these properties to propose a method (RPC) for learning simple policies. This method brings together ideas from information bottlenecks, model-based RL, and bits-back coding into a simple and theoretically-justified algorithm. Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate. We demonstrate that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.

READ FULL TEXT

page 7

page 8

research
09/18/2022

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

While reinforcement learning (RL) methods that learn an internal model o...
research
10/06/2021

Mismatched No More: Joint Model-Policy Optimization for Model-Based RL

Many model-based reinforcement learning (RL) methods follow a similar te...
research
09/10/2020

A framework for reinforcement learning with autocorrelated actions

The subject of this paper is reinforcement learning. Policies are consid...
research
03/01/2022

AI Planning Annotation for Sample Efficient Reinforcement Learning

AI planning and Reinforcement Learning (RL) both solve sequential decisi...
research
08/04/2021

Policy Gradients Incorporating the Future

Reasoning about the future – understanding how decisions in the present ...
research
04/12/2019

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Reinforcement learning (RL) is about sequential decision making and is t...
research
11/10/2020

Dirichlet policies for reinforced factor portfolios

This article aims to combine factor investing and reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset