Sparsity Prior Regularized Q-learning for Sparse Action Tasks

05/18/2021
by   Jing-Cheng Pang, et al.
0

In many decision-making tasks, some specific actions are limited in their frequency or total amounts, such as "fire" in the gunfight game and "buy/sell" in the stock trading. We name such actions as "sparse action". Sparse action often plays a crucial role in achieving good performance. However, their Q-values, estimated by classical Bellman update, usually suffer from a large estimation error due to the sparsity of their samples. The greedy policy could be greatly misled by the biased Q-function and takes sparse action aggressively, which leads to a huge sub-optimality. This paper constructs a reference distribution that assigns a low probability to sparse action and proposes a regularized objective with an explicit constraint to the reference distribution. Furthermore, we derive a regularized Bellman operator and a regularized optimal policy that can slow down the propagation of error and guide the agent to take sparse action more carefully. The experiment results demonstrate that our method achieves state-of-the-art performance on typical sparse action tasks.

READ FULL TEXT
research
02/10/2018

Path Consistency Learning in Tsallis Entropy Regularized MDPs

We study the sparse entropy-regularized reinforcement learning (ERL) pro...
research
03/02/2019

A Unified Framework for Regularized Reinforcement Learning

We propose and study a general framework for regularized Markov decision...
research
06/20/2023

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

Discount regularization, using a shorter planning horizon when calculati...
research
08/16/2021

Implicitly Regularized RL with Implicit Q-Values

The Q-function is a central quantity in many Reinforcement Learning (RL)...
research
04/03/2023

Action Pick-up in Dynamic Action Space Reinforcement Learning

Most reinforcement learning algorithms are based on a key assumption tha...
research
11/25/2009

On ℓ_1-regularized estimation for nonlinear models that have sparse underlying linear structures

In a recent work (arXiv:0910.2517), for nonlinear models with sparse und...
research
06/18/2020

Weighted QMIX: Expanding Monotonic Value Function Factorisation

QMIX is a popular Q-learning algorithm for cooperative MARL in the centr...

Please sign up or login with your details

Forgot password? Click here to reset