Optimism in Reinforcement Learning with Generalized Linear Function Approximation

12/09/2019
by   Yining Wang, et al.
0

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of Õ(√(d^3 T)) where d is the dimensionality of the state-action features and T is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/14/2019

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Q-learning with function approximation is one of the most popular method...
02/17/2019

Learning Linear-Quadratic Regulators Efficiently with only √(T) Regret

We present the first computationally-efficient algorithm with O(√(T)) r...
08/13/2020

Reinforcement Learning with Trajectory Feedback

The computational model of reinforcement learning is based upon the abil...
05/25/2022

Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions

Due to the drastic gap in complexity between sequential and batch statis...
03/01/2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations

We study the computational tractability of provably sample-efficient (PA...
06/01/2022

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The Q-learning algorithm is a simple and widely-used stochastic approxim...