Understanding the Impact of Data Distribution on Q-learning with Function Approximation

11/23/2021
by   Pedro P. Santos, et al.
0

In this work, we focus our attention on the study of the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a theoretical and empirical analysis as to why different properties of the data distribution can contribute to regulating sources of algorithmic instability. First, we revisit theoretical bounds on the performance of approximate dynamic programming algorithms. Second, we provide a novel four-state MDP that highlights the impact of the data distribution in the performance of a Q-learning algorithm with function approximation, both in online and offline settings. Finally, we experimentally assess the impact of the data distribution properties in the performance of an offline deep Q-network algorithm. Our results show that: (i) the data distribution needs to possess certain properties in order to robustly learn in an offline setting, namely low distance to the distributions induced by optimal policies of the MDP and high coverage over the state-action space; and (ii) high entropy data distributions can contribute to mitigating sources of algorithmic instability.

READ FULL TEXT

page 20

page 23

page 24

research
05/23/2022

Distance-Sensitive Offline Reinforcement Learning

In offline reinforcement learning (RL), one detrimental issue to policy ...
research
12/01/2021

Provable Guarantees for Understanding Out-of-distribution Detection

Out-of-distribution (OOD) detection is important for deploying machine l...
research
06/23/2020

Learning Based Distributed Tracking

Inspired by the great success of machine learning in the past decade, pe...
research
12/13/2021

How to Learn when Data Gradually Reacts to Your Model

A recent line of work has focused on training machine learning (ML) mode...
research
10/09/2022

The Role of Coverage in Online Reinforcement Learning

Coverage conditions – which assert that the data logging distribution ad...
research
06/03/2019

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Off-policy reinforcement learning aims to leverage experience collected ...
research
08/08/2016

Online Adaptation of Deep Architectures with Reinforcement Learning

Online learning has become crucial to many problems in machine learning....

Please sign up or login with your details

Forgot password? Click here to reset