Policy Gradient With Serial Markov Chain Reasoning

10/13/2022
by   Edoardo Cetin, et al.
25

We introduce a new framework that performs decision-making in reinforcement learning (RL) as an iterative reasoning process. We model agent behavior as the steady-state distribution of a parameterized reasoning Markov chain (RMC), optimized with a new tractable estimate of the policy gradient. We perform action selection by simulating the RMC for enough reasoning steps to approach its steady-state distribution. We show our framework has several useful properties that are inherently missing from traditional RL. For instance, it allows agent behavior to approximate any continuous distribution over actions by parameterizing the RMC with a simple Gaussian transition function. Moreover, the number of reasoning steps to reach convergence can scale adaptively with the difficulty of each action selection decision and can be accelerated by re-using past solutions. Our resulting algorithm achieves state-of-the-art performance in popular Mujoco and DeepMind Control benchmarks, both for proprioceptive and pixel-based tasks.

READ FULL TEXT

page 1

page 9

page 34

research
11/12/2020

Steady State Analysis of Episodic Reinforcement Learning

This paper proves that the episodic learning environment of every finite...
research
04/17/2019

Off-Policy Policy Gradient with State Distribution Correction

We study the problem of off-policy policy optimization in Markov decisio...
research
06/15/2020

An online evolving framework for advancing reinforcement-learning based automated vehicle control

In this paper, an online evolving framework is proposed to detect and re...
research
07/20/2020

Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits

We consider a policy gradient algorithm applied to a finite-arm bandit p...
research
11/19/2021

Learn Quasi-stationary Distributions of Finite State Markov Chain

We propose a reinforcement learning (RL) approach to compute the express...
research
01/04/2014

A stochastic model for Case-Based Reasoning

Case-Bsed Reasoning (CBR) is a recent theory for problem-solving and lea...
research
09/13/2020

The Platform Design Problem

On-line firms deploy suites of software platforms, where each platform i...

Please sign up or login with your details

Forgot password? Click here to reset