Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

04/12/2019
by   Eric Benhamou, et al.
0

Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or online learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Policy Gradient for Reinforcement Learning with General Utilities

In Reinforcement Learning (RL), the goal of agents is to discover an opt...
research
12/18/2016

Sample-efficient Deep Reinforcement Learning for Dialog Control

Representing a dialog policy as a recurrent neural network (RNN) is attr...
research
05/24/2022

An interpretation of the final fully connected layer

In recent years neural networks have achieved state-of-the-art accuracy ...
research
10/05/2021

Quasi-Newton policy gradient algorithms

Policy gradient algorithms have been widely applied to reinforcement lea...
research
07/09/2023

Investigating the Edge of Stability Phenomenon in Reinforcement Learning

Recent progress has been made in understanding optimisation dynamics in ...
research
03/02/2018

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application

In e-commerce platforms such as Amazon and TaoBao, ranking items in a se...
research
09/07/2021

Robust Predictable Control

Many of the challenges facing today's reinforcement learning (RL) algori...

Please sign up or login with your details

Forgot password? Click here to reset