Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods

02/14/2022
by   Xingang Guo, et al.
0

Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.

READ FULL TEXT
research
06/03/2016

Difference of Convex Functions Programming Applied to Control with Expert Data

This paper reports applications of Difference of Convex functions (DC) p...
research
04/22/2022

Analysis of Temporal Difference Learning: Linear System Approach

The goal of this technical note is to introduce a new finite-time conver...
research
01/15/2020

Lipschitz Lifelong Reinforcement Learning

We consider the problem of knowledge transfer when an agent is facing a ...
research
07/20/2020

Lagrangian Duality in Reinforcement Learning

Although duality is used extensively in certain fields, such as supervis...
research
06/29/2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

We propose a novel value approximation method, namely Eigensubspace Regu...
research
10/18/2019

On Connections between Constrained Optimization and Reinforcement Learning

Dynamic Programming (DP) provides standard algorithms to solve Markov De...
research
05/28/2019

Conditions on Features for Temporal Difference-Like Methods to Converge

The convergence of many reinforcement learning (RL) algorithms with line...

Please sign up or login with your details

Forgot password? Click here to reset