Steady State Analysis of Episodic Reinforcement Learning

11/12/2020
by   Huang Bojun, et al.
0

This paper proves that the episodic learning environment of every finite-horizon decision task has a unique steady state under any behavior policy, and that the marginal distribution of the agent's input indeed approaches to the steady-state distribution in essentially all episodic learning processes. This observation supports an interestingly reversed mindset against conventional wisdom: While steady states are usually presumed to exist in continual learning and are considered less relevant in episodic learning, it turns out they are guaranteed to exist for the latter. Based on this insight, the paper further develops connections between episodic and continual RL for several important concepts that have been separately treated in the two RL formalisms. Practically, the existence of unique and approachable steady state enables a general, reliable, and efficient way to collect data in episodic RL tasks, which the paper applies to policy gradient algorithms as a demonstration, based on a new steady-state policy gradient theorem. The paper also proposes and empirically evaluates a perturbation method that facilitates rapid mixing in real-world tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

Policy Gradient With Serial Markov Chain Reasoning

We introduce a new framework that performs decision-making in reinforcem...
research
01/31/2022

Steady-State Error Compensation in Reference Tracking and Disturbance Rejection Problems for Reinforcement Learning-Based Control

Reinforcement learning (RL) is a promising, upcoming topic in automatic ...
research
10/27/2020

Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Off-policy Reinforcement Learning (RL) holds the promise of better data ...
research
12/13/2021

Continual Learning In Environments With Polynomial Mixing Times

The mixing time of the Markov chain induced by a policy limits performan...
research
07/12/2022

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

In lifelong learning, an agent learns throughout its entire life without...
research
12/10/2022

Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

We revisit the domain of off-policy policy optimization in RL from the p...

Please sign up or login with your details

Forgot password? Click here to reset