Offline Reinforcement Learning with Pseudometric Learning

03/02/2021
by   Robert Dadashi, et al.
0

Offline Reinforcement Learning methods seek to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions. In this work, we propose an iterative procedure to learn a pseudometric (closely related to bisimulation metrics) from logged transitions, and use it to define this notion of closeness. We show its convergence and extend it to the function approximation setting. We then use this pseudometric to define a new lookup based bonus in an actor-critic algorithm: PLOff. This bonus encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions. Finally, we evaluate the method on hand manipulation and locomotion tasks.

READ FULL TEXT

page 7

page 19

page 20

page 21

research
02/26/2018

Addressing Function Approximation Error in Actor-Critic Methods

In value-based reinforcement learning methods such as deep Q-learning, f...
research
08/19/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning p...
research
05/17/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Offline Reinforcement Learning promises to learn effective policies from...
research
11/14/2020

PLAS: Latent Action Space for Offline Reinforcement Learning

The goal of offline reinforcement learning is to learn a policy from a f...
research
10/09/2019

Investigation on the generalization of the Sampled Policy Gradient algorithm

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-criti...
research
04/19/2023

CASOG: Conservative Actor-critic with SmOoth Gradient for Skill Learning in Robot-Assisted Intervention

Robot-assisted intervention has shown reduced radiation exposure to phys...
research
05/28/2019

Generation of Policy-Level Explanations for Reinforcement Learning

Though reinforcement learning has greatly benefited from the incorporati...

Please sign up or login with your details

Forgot password? Click here to reset