Reinforcement Learning via Fenchel-Rockafellar Duality

01/07/2020
by   Ofir Nachum, et al.
0

We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a variety of reinforcement learning (RL) settings, including policy evaluation or optimization, online or offline learning, and discounted or undiscounted rewards. The derivations yield a number of intriguing results, including the ability to perform policy evaluation and on-policy policy gradient with behavior-agnostic offline data and methods to learn a policy via max-likelihood optimization. Although many of these results have appeared previously in various forms, we provide a unified treatment and perspective on these results, which we hope will enable researchers to better use and apply the tools of convex duality to make further progress in RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2020

Lagrangian Duality in Reinforcement Learning

Although duality is used extensively in certain fields, such as supervis...
research
09/25/2022

On the Opportunities and Challenges of using Animals Videos in Reinforcement Learning

We investigate the use of animals videos to improve efficiency and perfo...
research
12/22/2017

A short variational proof of equivalence between policy gradients and soft Q learning

Two main families of reinforcement learning algorithms, Q-learning and p...
research
02/03/2023

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Reward function is essential in reinforcement learning (RL), serving as ...
research
11/30/2022

General policy mapping: online continual reinforcement learning inspired on the insect brain

We have developed a model for online continual or lifelong reinforcement...
research
12/13/2022

A Review of Off-Policy Evaluation in Reinforcement Learning

Reinforcement learning (RL) is one of the most vibrant research frontier...
research
12/29/2017

Smoothed Dual Embedding Control

We revisit the Bellman optimality equation with Nesterov's smoothing tec...

Please sign up or login with your details

Forgot password? Click here to reset