Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

02/17/2020
by   Simon S. Du, et al.
0

The current paper studies the problem of agnostic Q-learning with function approximation in deterministic systems where the optimal Q-function is approximable by a function in the class F with approximation error δ> 0. We propose a novel recursion-based algorithm and show that if δ = O(ρ/√(_E)), then one can find the optimal policy using O(_E) trajectories, where ρ is the gap between the optimal Q-value of the best actions and that of the second-best actions and _E is the Eluder dimension of F. Our result has two implications: 1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition δ = Θ(ρ/√(dim_E)) is necessary and sufficient for algorithms with polynomial sample complexity. 2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity Θ(dim_E) is tight even in the agnostic setting. Therefore, we settle the open problem on agnostic Q-learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2020

Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

We investigate the problem of best-policy identification in discounted M...
research
05/30/2022

Data Banzhaf: A Data Valuation Framework with Maximal Robustness to Learning Stochasticity

This paper studies the robustness of data valuation to noisy model perfo...
research
04/19/2014

Tight bounds for learning a mixture of two gaussians

We consider the problem of identifying the parameters of an unknown mixt...
research
06/19/2023

Least Square Value Iteration is Robust Under Locally Bounded Misspecification Error

The success of reinforcement learning heavily relies on the function app...
research
05/18/2023

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

We focus on the task of learning a single index model σ(w^⋆· x) with res...
research
05/20/2022

Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search

Greedy best-first search (GBFS) and A* search (A*) are popular algorithm...
research
06/21/2023

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

In stochastic zeroth-order optimization, a problem of practical relevanc...

Please sign up or login with your details

Forgot password? Click here to reset