
Quantile Reinforcement Learning
In reinforcement learning, the standard criterion to evaluate policies i...
read it

Tight Bayesian Ambiguity Sets for Robust MDPs
Robustness is important for sequential decision making in a stochastic d...
read it

Reinforcement Learning of RiskConstrained Policies in Markov Decision Processes
Markov decision processes (MDPs) are the defacto framework for sequenti...
read it

Revisiting Peng's Q(λ) for Modern Reinforcement Learning
Offpolicy multistep reinforcement learning algorithms consist of conse...
read it

AverageReward Reinforcement Learning with Trust Region Methods
Most of reinforcement learning algorithms optimize the discounted criter...
read it

Outside the Echo Chamber: Optimizing the Performative Risk
In performative prediction, predictions guide decisionmaking and hence ...
read it

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers
We introduce a criterion, resilience, which allows properties of a datas...
read it
SoftRobust Algorithms for Handling Model Misspecification
In reinforcement learning, robust policies for highstakes decisionmaking problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is nonconvex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the softrobust criterion, which uses risk measures to balance the mean and percentile criteria better. In this paper, we establish the softrobust criterion's fundamental properties, show that it is NPhard to optimize, and propose and analyze two algorithms to optimize it approximately. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the percentilecriterion.
READ FULL TEXT
Comments
There are no comments yet.