Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

02/06/2022
by   Jing Dong, et al.
0

In reinforcement learning (RL), offline learning decoupled learning from data collection and is useful in dealing with exploration-exploitation tradeoff and enables data reuse in many applications. In this work, we study two offline learning tasks: policy evaluation and policy learning. For policy evaluation, we formulate it as a stochastic optimization problem and show that it can be solved using approximate stochastic gradient descent (aSGD) with time-dependent data. We show aSGD achieves Õ(1/t) convergence when the loss function is strongly convex and the rate is independent of the discount factor γ. This result can be extended to include algorithms making approximately contractive iterations such as TD(0). The policy evaluation algorithm is then combined with the policy iteration algorithm to learn the optimal policy. To achieve an ϵ accuracy, the complexity of the algorithm is Õ(ϵ^-2(1-γ)^-5), which matches the complexity bound for classic online RL algorithms such as Q-learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2022

Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

In constrained reinforcement learning (C-RL), an agent seeks to learn fr...
research
05/09/2023

Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

The nuclear fuel loading pattern optimization problem has been studied s...
research
11/23/2022

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear functio...
research
01/30/2021

Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

We present new policy mirror descent (PMD) methods for solving reinforce...
research
06/02/2020

Learning optimal environments using projected stochastic gradient ascent

In this work, we generalize the direct policy search algorithms to an al...
research
06/30/2023

Resetting the Optimizer in Deep RL: An Empirical Study

We focus on the task of approximating the optimal value function in deep...
research
05/28/2021

Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

The reinforcement learning problem of finding a control policy that mini...

Please sign up or login with your details

Forgot password? Click here to reset