Lookahead-Bounded Q-Learning

06/28/2020
by   Ibrahim El Shar, et al.
12

We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to improve the performance of standard Q-learning in stochastic environments through the use of “lookahead” upper and lower bounds. To do this, LBQL employs previously collected experience and each iteration's state-action values as dual feasible penalties to construct a sequence of sampled information relaxation problems. The solutions to these problems provide estimated upper and lower bounds on the optimal value, which we track via stochastic approximation. These quantities are then used to constrain the iterates to stay within the bounds at every iteration. Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.

READ FULL TEXT
research
02/13/2013

Computing Upper and Lower Bounds on Likelihoods in Intractable Networks

We present deterministic techniques for computing upper and lower bounds...
research
12/10/2018

On the Interrelation between Dependence Coefficients of Extreme Value Copulas

For extreme value copulas with a known upper tail dependence coefficient...
research
11/13/2020

Convergence Properties of Stochastic Hypergradients

Bilevel optimization problems are receiving increasing attention in mach...
research
09/16/2020

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an ...
research
12/02/2019

Conformance Checking Approximation using Subset Selection and Edit Distance

Conformance checking techniques let us find out to what degree a process...
research
07/10/2021

Lower Bounds for Prior Independent Algorithms

The prior independent framework for algorithm design considers how well ...
research
08/07/2023

Feasible approximation of matching equilibria for large-scale matching for teams problems

We propose a numerical algorithm for computing approximately optimal sol...

Please sign up or login with your details

Forgot password? Click here to reset