Contrastive Value Learning: Implicit Models for Simple Offline RL

11/03/2022
by   Bogdan Mazoure, et al.
0

Model-based reinforcement learning (RL) methods are appealing in the offline setting because they allow an agent to reason about the consequences of actions without interacting with the environment. Prior methods learn a 1-step dynamics model, which predicts the next state given the current state and action. These models do not immediately tell the agent which actions to take, but must be integrated into a larger RL framework. Can we model the environment dynamics in a different way, such that the learned model does directly indicate the value of each action? In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics. This model can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action, without requiring any TD learning. Because this model represents the multi-step transitions implicitly, it avoids having to predict high-dimensional observations and thus scales to high-dimensional tasks. Our experiments demonstrate that CVL outperforms prior offline RL methods on complex continuous control benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2023

Contrastive Example-Based Control

While many real-world problems that might benefit from reinforcement lea...
research
06/11/2019

Learning Powerful Policies by Using Consistent Dynamics Model

Model-based Reinforcement Learning approaches have the promise of being ...
research
07/29/2021

Non-Markovian Reinforcement Learning using Fractional Dynamics

Reinforcement learning (RL) is a technique to learn the control policy f...
research
04/26/2022

Learning Value Functions from Undirected State-only Experience

This paper tackles the problem of learning value functions from undirect...
research
06/09/2023

Value function estimation using conditional diffusion models for control

A fairly reliable trend in deep reinforcement learning is that the perfo...
research
04/28/2021

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

Standard dynamics models for continuous control make use of feedforward ...
research
06/17/2023

The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Reinforcement learning (RL) algorithms have proven transformative in a r...

Please sign up or login with your details

Forgot password? Click here to reset