Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon

03/28/2023
by   Aditya Dave, et al.
0

Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time instance. Our second contribution is defining an approximate information state that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy and illustrate the effectiveness of our approach in partially observed decision-making problems with a numerical example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2023

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

In this paper, we investigate discrete-time decision-making problems in ...
research
06/26/2023

Beyond dynamic programming

In this paper, we present Score-life programming, a novel theoretical ap...
research
06/10/2021

Differentiable Robust LQR Layers

This paper proposes a differentiable robust LQR layer for reinforcement ...
research
12/17/2014

Optimal Triggering of Networked Control Systems

The problem of resource allocation of nonlinear networked control system...
research
02/19/2021

Learning to Stop with Surprisingly Few Samples

We consider a discounted infinite horizon optimal stopping problem. If t...
research
06/04/2019

Robust exploration in linear quadratic reinforcement learning

This paper concerns the problem of learning control policies for an unkn...
research
11/09/2020

Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems

In this paper we consider infinite horizon discounted dynamic programmin...

Please sign up or login with your details

Forgot password? Click here to reset