Behavior Constraining in Weight Space for Offline Reinforcement Learning

07/12/2021

∙

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

READ FULL TEXT

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Sign in with Google

Consider DeepAI Pro