Policy Evaluation in Distributional LQR

03/23/2023
by   Zifan Wang, et al.
0

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call distributional LQR. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2020

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement le...
research
02/21/2019

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distribution...
research
07/21/2017

A Distributional Perspective on Reinforcement Learning

In this paper we argue for the fundamental importance of the value distr...
research
08/28/2022

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Learning a predictive model of the mean return, or value function, plays...
research
12/28/2021

Robustness and risk management via distributional dynamic programming

In dynamic programming (DP) and reinforcement learning (RL), an agent le...
research
01/31/2022

On solutions of the distributional Bellman equation

In distributional reinforcement learning not only expected returns but t...
research
06/11/2021

Automatic Risk Adaptation in Distributional Reinforcement Learning

The use of Reinforcement Learning (RL) agents in practical applications ...

Please sign up or login with your details

Forgot password? Click here to reset