Learning Robust Options by Conditional Value at Risk Optimization

05/22/2019
by   Takuya Hiraoka, et al.
0

Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces options that do not work well in the unconsidered case. In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases. We extend the CVaR-based policy gradient method proposed by Chow and Ghavamzadeh (2014) to deal with robust Markov decision processes and then apply the extended method to learning robust options. We conduct experiments to evaluate our method in multi-joint robot control tasks (HopperIceBlock, Half-Cheetah, and Walker2D). Experimental results show that our method produces options that 1) give better worst-case performance than the options learned only to minimize the average-case loss, and 2) give better average-case performance than the options learned only to minimize the worst-case loss.

READ FULL TEXT
research
08/22/2023

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

The robust constrained Markov decision process (RCMDP) is a recent task-...
research
02/09/2018

Learning Robust Options

Robust reinforcement learning aims to produce policies that have strong ...
research
03/23/2023

Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization

Robust generalization aims to tackle the most challenging data distribut...
research
05/27/2023

A Model-Based Method for Minimizing CVaR and Beyond

We develop a variant of the stochastic prox-linear method for minimizing...
research
11/20/2017

Situationally Aware Options

Hierarchical abstractions, also known as options -- a type of temporally...
research
04/06/2023

Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data

The case-cohort design obtains complete covariate data only on cases and...
research
06/04/2019

Robust exploration in linear quadratic reinforcement learning

This paper concerns the problem of learning control policies for an unkn...

Please sign up or login with your details

Forgot password? Click here to reset