Soft-Robust Algorithms for Handling Model Misspecification

11/30/2020
by   Elita A. Lobo, et al.
0

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the soft-robust criterion, which uses risk measures to balance the mean and percentile criteria better. In this paper, we establish the soft-robust criterion's fundamental properties, show that it is NP-hard to optimize, and propose and analyze two algorithms to optimize it approximately. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the percentile-criterion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2016

Quantile Reinforcement Learning

In reinforcement learning, the standard criterion to evaluate policies i...
research
11/15/2018

Tight Bayesian Ambiguity Sets for Robust MDPs

Robustness is important for sequential decision making in a stochastic d...
research
02/27/2020

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequenti...
research
05/09/2023

Distributional Multi-Objective Decision Making

For effective decision support in scenarios with conflicting objectives,...
research
02/27/2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning

Off-policy multi-step reinforcement learning algorithms consist of conse...
research
08/12/2020

Graph Drawing via Gradient Descent, (GD)^2

Readability criteria, such as distance or neighborhood preservation, are...
research
11/05/2018

Task Embedded Coordinate Update: A Realizable Framework for Multivariate Non-convex Optimization

We in this paper propose a realizable framework TECU, which embeds task-...

Please sign up or login with your details

Forgot password? Click here to reset