Reinforcement Learning as Iterative and Amortised Inference

by   Beren Millidge, et al.

There are several ways to categorise reinforcement learning (RL) algorithms, such as either model-based or model-free, policy-based or planning-based, on-policy or off-policy, and online or offline. Broad classification schemes such as these help provide a unified perspective on disparate techniques and can contextualise and guide the development of new algorithms. In this paper, we utilise the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We demonstrate that a wide range of algorithms can be classified in this manner providing a fresh perspective and highlighting a range of existing similarities. Moreover, we show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored, suggesting new routes to innovative RL algorithms.


page 1

page 2

page 3

page 4


Control as Hybrid Inference

The field of reinforcement learning can be split into model-based and mo...

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data co...

Iterative Amortized Policy Optimization

Policy networks are a central feature of deep reinforcement learning (RL...

A Unified Framework for Alternating Offline Model Training and Policy Learning

In offline model-based reinforcement learning (offline MBRL), we learn a...

Local Search for Policy Iteration in Continuous Control

We present an algorithm for local, regularized, policy improvement in re...

Policy Resilience to Environment Poisoning Attacks on Reinforcement Learning

This paper investigates policy resilience to training-environment poison...

AcroMonk: A Minimalist Underactuated Brachiating Robot

Brachiation is a dynamic, coordinated swinging maneuver of body and arms...

Please sign up or login with your details

Forgot password? Click here to reset