Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits

12/02/2018
by   Zhao Song, et al.
0

The softmax function has been primarily employed in reinforcement learning (RL) to improve exploration and provide a differentiable approximation to the max function, as also observed in the mellowmax paper by Asadi and Littman. This paper instead focuses on using the softmax function in the Bellman updates, independent of the exploration strategy. Our main theory provides a performance bound for the softmax Bellman operator, and shows it converges to the standard Bellman operator exponentially fast in the inverse temperature parameter. We also prove that under certain conditions, the softmax operator can reduce the overestimation error and the gradient noise. A detailed comparison among different Bellman operators is then presented to show the trade-off when selecting them. We apply the softmax operator to deep RL by combining it with the deep Q-network (DQN) and double DQN algorithms in an off-policy fashion, and demonstrate that these variants can often achieve better performance in several Atari games, and compare favorably to their mellowmax counterpart.

READ FULL TEXT

page 6

page 7

page 12

page 13

research
12/16/2016

An Alternative Softmax Operator for Reinforcement Learning

A softmax operator applied to a set of values acts somewhat like the max...
research
06/09/2020

A t-distribution based operator for enhancing out of distribution robustness of neural network classifiers

Neural Network (NN) classifiers can assign extreme probabilities to samp...
research
03/14/2019

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Value function estimation is an important task in reinforcement learning...
research
02/28/2023

The In-Sample Softmax for Offline Reinforcement Learning

Reinforcement learning (RL) agents can leverage batches of previously co...
research
03/15/2020

Analysis of Softmax Approximation for Deep Classifiers under Input-Dependent Label Noise

Modelling uncertainty arising from input-dependent label noise is an inc...
research
06/09/2021

Bayesian Bellman Operators

We introduce a novel perspective on Bayesian reinforcement learning (RL)...
research
11/01/2019

Generalized Speedy Q-learning

In this paper, we derive a generalization of the Speedy Q-learning (SQL)...

Please sign up or login with your details

Forgot password? Click here to reset