DeepAI AI Chat
Log In Sign Up

Soft-Robust Algorithms for Handling Model Misspecification

by   Elita A. Lobo, et al.

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the soft-robust criterion, which uses risk measures to balance the mean and percentile criteria better. In this paper, we establish the soft-robust criterion's fundamental properties, show that it is NP-hard to optimize, and propose and analyze two algorithms to optimize it approximately. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the percentile-criterion.


page 1

page 2

page 3

page 4


Quantile Reinforcement Learning

In reinforcement learning, the standard criterion to evaluate policies i...

Tight Bayesian Ambiguity Sets for Robust MDPs

Robustness is important for sequential decision making in a stochastic d...

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequenti...

Distributional Multi-Objective Decision Making

For effective decision support in scenarios with conflicting objectives,...

Revisiting Peng's Q(λ) for Modern Reinforcement Learning

Off-policy multi-step reinforcement learning algorithms consist of conse...

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion...

Graph Drawing via Gradient Descent, (GD)^2

Readability criteria, such as distance or neighborhood preservation, are...