Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

06/15/2021
by   Dhruv Malik, et al.
2

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.

READ FULL TEXT

page 2

page 8

page 14

page 15

page 16

research
12/12/2022

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation....
research
04/25/2023

Provable benefits of general coverage conditions in efficient online RL with function approximation

In online reinforcement learning (RL), instead of employing standard str...
research
07/12/2021

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

Many reinforcement learning (RL) environments in practice feature enormo...
research
05/22/2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
research
08/28/2015

Learning Efficient Representations for Reinforcement Learning

Markov decision processes (MDPs) are a well studied framework for solvin...
research
06/22/2021

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

There have been many recent advances on provably efficient Reinforcement...
research
06/23/2022

Recursive Reinforcement Learning

Recursion is the fundamental paradigm to finitely describe potentially i...

Please sign up or login with your details

Forgot password? Click here to reset