Optimism in Reinforcement Learning with Generalized Linear Function Approximation

by   Yining Wang, et al.

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of Õ(√(d^3 T)) where d is the dimensionality of the state-action features and T is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.


page 1

page 2

page 3

page 4


Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Q-learning with function approximation is one of the most popular method...

Learning Linear-Quadratic Regulators Efficiently with only √(T) Regret

We present the first computationally-efficient algorithm with O(√(T)) r...

Reinforcement Learning with Trajectory Feedback

The computational model of reinforcement learning is based upon the abil...

Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions

Due to the drastic gap in complexity between sequential and batch statis...

On Polynomial Time PAC Reinforcement Learning with Rich Observations

We study the computational tractability of provably sample-efficient (PA...

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

The Q-learning algorithm is a simple and widely-used stochastic approxim...