Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

02/13/2020
by   Yangang Ren, et al.
0

Reinforcement learning (RL) has achieved remarkable performance in a variety of sequential decision making and control tasks. However, a common problem is that learned nearly optimal policy always overfits to the training environment and may not be extended to situations never encountered during training. For practical applications, the randomness of the environment usually leads to rare but devastating events, which should be the focus of safety-critical systems, such as autonomous driving. In this paper, we introduce the minimax formulation and distributional framework to improve the generalization ability of RL algorithms and develop the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm. Minimax formulation aims to seek optimal policy considering the most serious disturbances from environment, in which the protagonist policy maximizes action-value function while the adversary policy tries to minimize it. Distributional framework aims to learn a state-action return distribution, from which we can model the risk of different returns explicitly, thus, formulating a risk-averse protagonist policy and a risk-seeking adversarial policy. We implement our method on the decision-making tasks of autonomous vehicles at intersections and test the trained policy in distinct environments from training environment. Results demonstrate that our method can greatly improve the generalization ability of the protagonist agent to different environmental variations.

READ FULL TEXT
research
04/20/2022

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Although Reinforcement Learning (RL) is effective for sequential decisio...
research
04/30/2020

Distributional Soft Actor Critic for Risk Sensitive Learning

Most of reinforcement learning (RL) algorithms aim at maximizing the exp...
research
09/12/2021

Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

In this paper, we propose a new reinforcement learning (RL) algorithm, c...
research
03/08/2021

Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Merging into the highway from the on-ramp is an essential scenario for a...
research
02/05/2021

Addressing Inherent Uncertainty: Risk-Sensitive Behavior Generation for Automated Driving using Distributional Reinforcement Learning

For highly automated driving above SAE level 3, behavior generation algo...
research
02/07/2020

Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP)

The optimal policy of a reinforcement learning problem is often disconti...
research
04/07/2021

Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

Modern navigation algorithms based on deep reinforcement learning (RL) s...

Please sign up or login with your details

Forgot password? Click here to reset