Soft Q-network

12/20/2019
by   Jingbin Liu, et al.
0

When DQN is announced by deepmind in 2013, the whole world is surprised by the simplicity and promising result, but due to the low efficiency and stability of this method, it is hard to solve many problems. After all these years, people purposed more and more complicated ideas for improving, many of them use distributed Deep-RL which needs tons of cores to run the simulators. However, the basic ideas behind all this technique are sometimes just a modified DQN. So we asked a simple question, is there a more elegant way to improve the DQN model? Instead of adding more and more small fixes on it, we redesign the problem setting under a popular entropy regularization framework which leads to better performance and theoretical guarantee. Finally, we purposed SQN, a new off-policy algorithm with better performance and stability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2019

On-policy Reinforcement Learning with Entropy Regularization

Entropy regularization is an imported idea in reinforcement learning, wi...
research
10/26/2021

EnTRPO: Trust Region Policy Optimization Method with Entropy Regularization

Trust Region Policy Optimization (TRPO) is a popular and empirically suc...
research
01/28/2022

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task...
research
11/13/2020

Reinforcement Learning Control of Constrained Dynamic Systems with Uniformly Ultimate Boundedness Stability Guarantee

Reinforcement learning (RL) is promising for complicated stochastic nonl...
research
05/14/2019

Control Regularization for Reduced Variance Reinforcement Learning

Dealing with high variance is a significant challenge in model-free rein...
research
06/07/2017

Imposing Hard Constraints on Deep Networks: Promises and Limitations

Imposing constraints on the output of a Deep Neural Net is one way to im...
research
02/05/2023

Refined Value-Based Offline RL under Realizability and Partial Coverage

In offline reinforcement learning (RL) we have no opportunity to explore...

Please sign up or login with your details

Forgot password? Click here to reset