
On the Sample Complexity of Reinforcement Learning with a Generative Model
We consider the problem of learning the optimal actionvalue function in...
read it

PERETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a s...
read it

InfiniteHorizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
In this paper, we investigate the sample complexity of policy evaluation...
read it

Tighter Sparse Approximation Bounds for ReLU Neural Networks
A wellknown line of work (Barron, 1993; Breiman, 1993; Klusowski Ba...
read it

Temporaldifference learning for nonlinear value function approximation in the lazy training regime
We discuss the approximation of the value function for infinitehorizon ...
read it

On the Optimization Dynamics of Wide Hypernetworks
Recent results in the theoretical study of deep learning have shown that...
read it

Robust Structured Statistical Estimation via Conditional Gradient Type Methods
Structured statistical estimation problems are often solved by Condition...
read it
Sample Complexity and Overparameterization Bounds for ProjectionFree Neural TD Learning
We study the dynamics of temporaldifference learning with neural networkbased value function approximation over a general state space, namely, Neural TD learning. Existing analysis of neural TD learning relies on either infinite widthanalysis or constraining the network parameters in a (random) compact set; as a result, an extra projection step is required at each iteration. This paper establishes a new convergence analysis of neural TD learning without any projection. We show that the projectionfree TD learning equipped with a twolayer ReLU network of any width exceeding poly(ν,1/ϵ) converges to the true value function with error ϵ given poly(ν,1/ϵ) iterations or samples, where ν is an upper bound on the RKHS norm of the value function induced by the neural tangent kernel. Our sample complexity and overparameterization bounds are based on a drift analysis of the network parameters as a stopped random process in the lazy training regime.
READ FULL TEXT
Comments
There are no comments yet.