Log In Sign Up

Maximum Entropy Reinforcement Learning with Mixture Policies

by   Nir Baram, et al.

Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closed-form expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, low-variance mixture-entropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.


page 1

page 2

page 3

page 4


Implicit Policy for Reinforcement Learning

We introduce Implicit Policy, a general class of expressive policies tha...

A series of maximum entropy upper bounds of the differential entropy

We present a series of closed-form maximum entropy upper bounds for the ...

Simplex NeuPL: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

Learning to play optimally against any mixture over a diverse set of str...

GALILEO: A Generalized Low-Entropy Mixture Model

We present a new method of generating mixture models for data with categ...

Contextual Policy Reuse using Deep Mixture Models

Reinforcement learning methods that consider the context, or current sta...

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...