Convex duality for stochastic shortest path problems in known and unknown environments

07/31/2022
by   Kelli Francis-Staite, et al.
0

This paper studies Stochastic Shortest Path (SSP) problems in known and unknown environments from the perspective of convex optimisation. It first recalls results in the known parameter case, and develops understanding through different proofs. It then focuses on the unknown parameter case, where it studies extended value iteration (EVI) operators. This includes the existing operators used in Rosenberg et al. [26] and Tarbouriech et al. [31] based on the l-1 norm and supremum norm, as well as defining EVI operators corresponding to other norms and divergences, such as the KL-divergence. This paper shows in general how the EVI operators relate to convex programs, and the form of their dual, where strong duality is exhibited. This paper then focuses on whether the bounds from finite horizon research of Neu and Pike-Burke [21] can be applied to these extended value iteration operators in the SSP setting. It shows that similar bounds to [21] for these operators exist, however they lead to operators that are not in general monotone and have more complex convergence properties. In a special case we observe oscillating behaviour. This paper generates open questions on how research may progress, with several examples that require further examination.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

On the convergence of optimistic policy iteration for stochastic shortest path problem

In this paper, we prove some convergence results of a special case of op...
research
02/10/2021

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

We make significant progress toward the stochastic shortest path problem...
research
05/04/2021

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

We propose two algorithms for episodic stochastic shortest path problems...
research
02/14/2012

Suboptimality Bounds for Stochastic Shortest Path Problems

We consider how to use the Bellman residual of the dynamic programming o...
research
08/30/2021

Beyond Value Iteration for Parity Games: Strategy Iteration with Universal Trees

Parity games have witnessed several new quasi-polynomial algorithms sinc...
research
06/15/2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...
research
10/05/2020

Are Two Binary Operators Necessary to Finitely Axiomatise Parallel Composition?

Bergstra and Klop have shown that bisimilarity has a finite equational a...

Please sign up or login with your details

Forgot password? Click here to reset