ConQUR: Mitigating Delusional Bias in Deep Q-learning

02/27/2020
by   Andy Su, et al.
20

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

READ FULL TEXT

page 14

page 17

page 20

research
01/25/2021

Diverse Adversaries for Mitigating Bias in Training

Adversarial learning can learn fairer and less biased models of language...
research
07/18/2023

Mitigating Label Bias via Decoupled Confident Learning

Growing concerns regarding algorithmic fairness have led to a surge in m...
research
05/02/2023

Mitigating Approximate Memorization in Language Models via Dissimilarity Learned Policy

Large Language models (LLMs) are trained on large amounts of data, which...
research
06/19/2019

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

We consider the core reinforcement-learning problem of on-policy value f...
research
11/19/2020

Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks

Collider bias is a harmful form of sample selection bias that neural net...
research
12/16/2021

Mitigating the Bias of Centered Objects in Common Datasets

Convolutional networks are considered shift invariant, but it was demons...
research
06/26/2023

Experiments with Detecting and Mitigating AI Deception

How to detect and mitigate deceptive AI systems is an open problem for t...

Please sign up or login with your details

Forgot password? Click here to reset