Dual Generator Offline Reinforcement Learning

11/02/2022
by   Quan Vuong, et al.
0

In offline RL, constraining the learned policy to remain close to the data is essential to prevent the policy from outputting out-of-distribution (OOD) actions with erroneously overestimated values. In principle, generative adversarial networks (GAN) can provide an elegant solution to do so, with the discriminator directly providing a probability that quantifies distributional shift. However, in practice, GAN-based offline RL methods have not performed as well as alternative approaches, perhaps because the generator is trained to both fool the discriminator and maximize return – two objectives that can be at odds with each other. In this paper, we show that the issue of conflicting objectives can be resolved by training two generators: one that maximizes return, with the other capturing the “remainder” of the data distribution in the offline dataset, such that the mixture of the two is close to the behavior policy. We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint, where the policy does not need to match the entire data distribution, but only the slice of the data that leads to high long term performance. We name our method DASCO, for Dual-Generator Adversarial Support Constrained Offline RL. On benchmark tasks that require learning from sub-optimal data, DASCO significantly outperforms prior methods that enforce distribution constraint.

READ FULL TEXT
research
11/02/2022

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Offline reinforcement learning (RL) learns policies entirely from static...
research
10/17/2022

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional ...
research
06/01/2022

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

We introduce an offline reinforcement learning (RL) algorithm that expli...
research
05/23/2022

Distance-Sensitive Offline Reinforcement Learning

In offline reinforcement learning (RL), one detrimental issue to policy ...
research
10/04/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Offline reinforcement learning (offline RL), which aims to find an optim...
research
12/04/2019

AlgaeDICE: Policy Gradient from Arbitrary Experience

In many real-world applications of reinforcement learning (RL), interact...
research
07/03/2021

Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning

The performance of state-of-the-art baselines in the offline RL regime v...

Please sign up or login with your details

Forgot password? Click here to reset