Distributional Soft Actor Critic for Risk Sensitive Learning

04/30/2020
by   Xiaoteng Ma, et al.
7

Most of reinforcement learning (RL) algorithms aim at maximizing the expectation of accumulated discounted returns. Since the accumulated discounted return is a random variable, its distribution includes more information than its expectation. Meanwhile, entropy of policy indicates its diversity and it can help improve the exploration capability of algorithms. In this paper, we present a new RL algorithm named Distributional Soft Actor Critic (DSAC), combining distributional RL and maximum entropy RL together. Taking the randomness both in action and discounted return into consideration, DSAC over performs the state-of-the-art baselines with more stability in several continuous control benchmarks. Moreover, distributional information of returns can also be used to measure metrics other than expectation, such as risk-related metrics. With a fully parameterized quantile function, DSAC is easily adopted to optimize policy under different risk preferences. Our experiments demonstrate that with distribution modeling in RL the agent performs better both for risk-averse and risk-seeking control tasks.

READ FULL TEXT
research
01/09/2020

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

In current reinforcement learning (RL) methods, function approximation e...
research
04/07/2021

Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

Modern navigation algorithms based on deep reinforcement learning (RL) s...
research
05/10/2022

Efficient Risk-Averse Reinforcement Learning

In risk-averse reinforcement learning (RL), the goal is to optimize some...
research
10/01/2019

Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping

The distributional perspective on reinforcement learning (RL) has given ...
research
02/13/2020

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Reinforcement learning (RL) has achieved remarkable performance in a var...
research
06/28/2022

Risk Perspective Exploration in Distributional Reinforcement Learning

Distributional reinforcement learning demonstrates state-of-the-art perf...
research
10/11/2021

Bid Optimization using Maximum Entropy Reinforcement Learning

Real-time bidding (RTB) has become a critical way of online advertising....

Please sign up or login with your details

Forgot password? Click here to reset