Reinforcement Learning with Dynamic Boltzmann Softmax Updates

03/14/2019
by   Ling Pan, et al.
0

Value function estimation is an important task in reinforcement learning, i.e., prediction. The commonly used operator for prediction in Q-learning is the hard max operator, which always commits to the maximum action-value according to current estimation. Such `hard' updating scheme results in pure exploitation and may lead to misbehavior due to noise in stochastic environments. Thus, it is critical to balancing exploration and exploitation in value function estimation. The Boltzmann softmax operator has a greater capability in exploring potential action-values. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator in value function estimation, which has good convergence property in the setting of planning and learning. Moreover, we prove that dynamic Boltzmann softmax updates can eliminate the overestimation phenomenon introduced by the hard max operator. Experimental results on GridWorld show that the DBS operator enables convergence and a better trade-off between exploration and exploitation in value function estimation. Finally, we propose the DBS-DQN algorithm by generalizing the dynamic Boltzmann softmax update in deep Q-network, which outperforms DQN substantially in 40 out of 49 Atari games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2016

An Alternative Softmax Operator for Reinforcement Learning

A softmax operator applied to a set of values acts somewhat like the max...
research
12/02/2018

Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits

The softmax function has been primarily employed in reinforcement learni...
research
10/19/2020

Softmax Deep Double Deterministic Policy Gradients

A widely-used actor-critic reinforcement learning algorithm for continuo...
research
06/06/2013

Direct Uncertainty Estimation in Reinforcement Learning

Optimal probabilistic approach in reinforcement learning is computationa...
research
11/25/2022

Operator Splitting Value Iteration

We introduce new planning and reinforcement learning algorithms for disc...
research
12/19/2019

Uncertainty-sensitive Learning and Planning with Ensembles

We propose a reinforcement learning framework for discrete environments ...
research
11/05/2019

Robo-advising: Learning Investor's Risk Preferences via Portfolio Choices

We introduce a reinforcement learning framework for retail robo-advising...

Please sign up or login with your details

Forgot password? Click here to reset