Improved Strongly Polynomial Algorithms for Deterministic MDPs, 2VPI Feasibility, and Discounted All-Pairs Shortest Paths
We revisit the problem of finding optimal strategies for deterministic Markov Decision Processes (DMDPs), and a closely related problem of testing feasibility of systems of m linear inequalities on n real variables with at most two variables per inequality (2VPI). We give a randomized trade-off algorithm solving both problems and running in Õ(nmh+(n/h)^3) time using Õ(n^2/h+m) space for any parameter h∈ [1,n]. In particular, using subquadratic space we get Õ(nm+n^3/2m^3/4) running time, which improves by a polynomial factor upon all the known upper bounds for non-dense instances with m=O(n^2-ϵ). Moreover, using linear space we match the randomized Õ(nm+n^3) time bound of Cohen and Megiddo [SICOMP'94] that required Θ̃(n^2+m) space. Additionally, we show a new algorithm for the Discounted All-Pairs Shortest Paths problem, introduced by Madani et al. [TALG'10], that extends the DMDPs with optional end vertices. For the case of uniform discount factors, we give a deterministic algorithm running in Õ(n^3/2m^3/4) time, which improves significantly upon the randomized bound Õ(n^2√(m)) of Madani et al.
READ FULL TEXT