Robustness and risk management via distributional dynamic programming

12/28/2021
by   Mastane Achab, et al.
0

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distributional reinforcement learning (DRL), the focus is on the whole distribution of the return, not just its expectation. Although DRL-based methods produced state-of-the-art performance in RL with function approximation, they involve additional quantities (compared to the non-distributional setting) that are still not well understood. As a first contribution, we introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation, that come with a robust MDP interpretation. Indeed, our approach reformulates through an augmented state space where each state is split into a worst-case substate and a best-case substate, whose values are maximized by safe and risky policies respectively. Finally, we derive distributional operators and DP algorithms solving a new control task: How to distinguish safe from risky optimal actions in order to break ties in the space of optimal policies?

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2021

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations

In real scenarios, state observations that an agent observes may contain...
research
04/27/2023

One-Step Distributional Reinforcement Learning

Reinforcement learning (RL) allows an agent interacting sequentially wit...
research
10/21/2022

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

One key challenge for multi-task Reinforcement learning (RL) in practice...
research
02/01/2022

Distributional Reinforcement Learning via Sinkhorn Iterations

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
03/23/2023

Policy Evaluation in Distributional LQR

Distributional reinforcement learning (DRL) enhances the understanding o...
research
04/19/2023

Robust Route Planning with Distributional Reinforcement Learning in a Stochastic Road Network Environment

Route planning is essential to mobile robot navigation problems. In rece...
research
11/04/2019

Lookahead Bayesian Optimization via Rollout: Guarantees and Sequential Rolling Horizons

Lookahead, also known as non-myopic, Bayesian optimization (BO) aims to ...

Please sign up or login with your details

Forgot password? Click here to reset