Borrowing From the Future: Addressing Double Sampling in Model-free Control

06/11/2020
by   Yuhua Zhu, et al.
2

In model-free reinforcement learning, the temporal difference method and its variants become unstable when combined with nonlinear function approximations. Bellman residual minimization with stochastic gradient descent (SGD) is more stable, but it suffers from the double sampling problem: given the current state, two independent samples for the next state are required, but often only one sample is available. Recently, the authors of [Zhu et al, 2020] introduced the borrowing from the future (BFF) algorithm to address this issue for the prediction problem. The main idea is to borrow extra randomness from the future to approximately re-sample the next state when the underlying dynamics of the problem are sufficiently smooth. This paper extends the BFF algorithm to action-value function based model-free control. We prove that BFF is close to unbiased SGD when the underlying dynamics vary slowly with respect to actions. We confirm our theoretical findings with numerical simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2019

Borrowing From the Future: An Attempt to Address Double Sampling

For model-free reinforcement learning, the main difficulty of stochastic...
research
11/15/2022

Model free Shapley values for high dimensional data

A model-agnostic variable importance method can be used with arbitrary p...
research
12/17/2018

Double Deep Q-Learning for Optimal Execution

Optimal trade execution is an important problem faced by essentially all...
research
03/24/2020

Finite-Time Analysis of Stochastic Gradient Descent under Markov Randomness

Motivated by broad applications in reinforcement learning and machine le...
research
11/15/2018

Reward-estimation variance elimination in sequential decision processes

Policy gradient methods are very attractive in reinforcement learning du...
research
08/09/2021

Modified Double DQN: addressing stability

Inspired by double q learning algorithm, the double DQN algorithm was or...
research
08/06/2020

Heterogeneous Idealization of Ion Channel Recordings – Open Channel Noise

We propose a new model-free segmentation method for idealizing ion chann...

Please sign up or login with your details

Forgot password? Click here to reset