Steady State Analysis of Episodic Reinforcement Learning

11/12/2020
by   Huang Bojun, et al.
0

This paper proves that the episodic learning environment of every finite-horizon decision task has a unique steady state under any behavior policy, and that the marginal distribution of the agent's input indeed approaches to the steady-state distribution in essentially all episodic learning processes. This observation supports an interestingly reversed mindset against conventional wisdom: While steady states are usually presumed to exist in continual learning and are considered less relevant in episodic learning, it turns out they are guaranteed to exist for the latter. Based on this insight, the paper further develops connections between episodic and continual RL for several important concepts that have been separately treated in the two RL formalisms. Practically, the existence of unique and approachable steady state enables a general, reliable, and efficient way to collect data in episodic RL tasks, which the paper applies to policy gradient algorithms as a demonstration, based on a new steady-state policy gradient theorem. The paper also proposes and empirically evaluates a perturbation method that facilitates rapid mixing in real-world tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset