A Regularized Implicit Policy for Offline Reinforcement Learning

02/19/2022
by   Shentao Yang, et al.
0

Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen–Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly smooth the state-action mapping for robust generalization beyond the static dataset. Extensive experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Behavior Constraining in Weight Space for Offline Reinforcement Learning

In offline reinforcement learning, a policy needs to be learned from a s...
research
06/05/2021

Learning Routines for Effective Off-Policy Reinforcement Learning

The performance of reinforcement learning depends upon designing an appr...
research
03/17/2021

Regularized Behavior Value Estimation

Offline reinforcement learning restricts the learning process to rely on...
research
10/14/2022

Mutual Information Regularized Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning an effective policy...
research
11/15/2021

Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning

Offline reinforcement learning-learning a policy from a batch of data-is...

Please sign up or login with your details

Forgot password? Click here to reset