Maximum Entropy Reinforcement Learning with Mixture Policies

03/18/2021
by   Nir Baram, et al.
0

Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closed-form expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, low-variance mixture-entropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP)

The optimal policy of a reinforcement learning problem is often disconti...
research
06/10/2018

Implicit Policy for Reinforcement Learning

We introduce Implicit Policy, a general class of expressive policies tha...
research
12/09/2016

A series of maximum entropy upper bounds of the differential entropy

We present a series of closed-form maximum entropy upper bounds for the ...
research
05/31/2022

Simplex NeuPL: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

Learning to play optimally against any mixture over a diverse set of str...
research
05/09/2021

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Building on the breakthrough of reinforcement learning, this paper intro...
research
06/20/2012

Mixture-of-Parents Maximum Entropy Markov Models

We present the mixture-of-parents maximum entropy Markov model (MoP-MEMM...
research
05/16/2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...

Please sign up or login with your details

Forgot password? Click here to reset