Implicitly Regularized RL with Implicit Q-Values

08/16/2021
by   Nino Vieillard, et al.
0

The Q-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to Q. It is a powerful tool that allows action selection without a model of the environment and even without explicitly modeling the policy. Yet, this scheme can only be used in discrete action tasks, with small numbers of actions, as the softmax cannot be computed exactly otherwise. Especially the usage of function approximation, to deal with continuous action spaces in modern actor-critic architectures, intrinsically prevents the exact computation of a softmax. We propose to alleviate this issue by parametrizing the Q-function implicitly, as the sum of a log-policy and of a value function. We use the resulting parametrization to derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the Q-value. We provide a theoretical analysis of our algorithm: from an Approximate Dynamic Programming perspective, we show its equivalence to a regularized version of value iteration, accounting for both entropy and Kullback-Leibler regularization, and that enjoys beneficial error propagation results. We then evaluate our algorithm on classic control tasks, where its results compete with state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2017

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforceme...
research
11/05/2019

Quinoa: a Q-function You Infer Normalized Over Actions

We present an algorithm for learning an approximate action-value soft Q-...
research
10/19/2020

Softmax Deep Double Deterministic Policy Gradients

A widely-used actor-critic reinforcement learning algorithm for continuo...
research
02/28/2023

The In-Sample Softmax for Offline Reinforcement Learning

Reinforcement learning (RL) agents can leverage batches of previously co...
research
12/04/2015

Q-Networks for Binary Vector Actions

In this paper reinforcement learning with binary vector actions was inve...
research
03/31/2020

Leverage the Average: an Analysis of Regularization in RL

Building upon the formalism of regularized Markov decision processes, we...
research
05/18/2021

Sparsity Prior Regularized Q-learning for Sparse Action Tasks

In many decision-making tasks, some specific actions are limited in thei...

Please sign up or login with your details

Forgot password? Click here to reset