Borrowing From the Future: An Attempt to Address Double Sampling

12/01/2019
by   Yuhua Zhu, et al.
0

For model-free reinforcement learning, the main difficulty of stochastic Bellman residual minimization is the double sampling problem, i.e., while only one single sample for the next state is available in the model-free setting, two independent samples for the next state are required in order to perform unbiased stochastic gradient descent. We propose new algorithms for addressing this problem based on the key idea of borrowing extra randomness from the future. When the transition kernel varies slowly with respect to the state, it is shown that the training trajectory of new algorithms is close to the one of unbiased stochastic gradient descent. We apply the new algorithms to policy evaluation in both tabular and neural network settings to confirm the theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Borrowing From the Future: Addressing Double Sampling in Model-free Control

In model-free reinforcement learning, the temporal difference method and...
research
05/15/2019

Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

We introduce a hybrid stochastic estimator to design stochastic gradient...
research
05/01/2022

Ridgeless Regression with Random Features

Recent theoretical studies illustrated that kernel ridgeless regression ...
research
09/29/2022

Computational Complexity of Sub-linear Convergent Algorithms

Optimizing machine learning algorithms that are used to solve the object...
research
03/24/2022

Local optimisation of Nyström samples through stochastic gradient descent

We study a relaxed version of the column-sampling problem for the Nyströ...
research
06/27/2018

Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data

Empirical risk minimization is the principal tool for prediction problem...
research
12/28/2017

Gradient Regularization Improves Accuracy of Discriminative Models

Regularizing the gradient norm of the output of a neural network with re...

Please sign up or login with your details

Forgot password? Click here to reset