Behavior Constraining in Weight Space for Offline Reinforcement Learning

07/12/2021
by   Phillip Swazinna, et al.
0

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2021

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Online interactions with the environment to collect data samples for tra...
research
07/10/2023

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

In some applications of reinforcement learning, a dataset of pre-collect...
research
02/19/2022

A Regularized Implicit Policy for Offline Reinforcement Learning

Offline reinforcement learning enables learning from a fixed dataset, wi...
research
01/19/2019

Towards Physically Safe Reinforcement Learning under Supervision

This paper addresses the question of how a previously available control ...
research
02/22/2022

Reward-Free Policy Space Compression for Reinforcement Learning

In reinforcement learning, we encode the potential behaviors of an agent...
research
11/26/2021

Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning

Recently developed offline reinforcement learning algorithms have made i...
research
01/03/2019

Imminent Collision Mitigation with Reinforcement Learning and Vision

This work examines the role of reinforcement learning in reducing the se...

Please sign up or login with your details

Forgot password? Click here to reset