On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes

by   Masoud Badiei Khuzani, et al.

We study the problem of learning policy of an infinite-horizon, discounted cost, Markov decision process (MDP) with a large number of states. We compute the actions of a policy that is nearly as good as a policy chosen by a suitable oracle from a given mixture policy class characterized by the convex hull of a set of known base policies. To learn the coefficients of the mixture model, we recast the problem as an approximate linear programming (ALP) formulation for MDPs, where the feature vectors correspond to the occupation measures of the base policies defined on the state-action space. We then propose a projection-free stochastic primal-dual method with the Bregman divergence to solve the characterized ALP. Furthermore, we analyze the probably approximately correct (PAC) sample complexity of the proposed stochastic algorithm, namely the number of queries required to achieve near optimal objective value. We also propose a modification of our proposed algorithm with the polytope constraint sampling for the smoothed ALP, where the restriction to lower bounding approximations are relaxed. In addition, we apply the proposed algorithms to a queuing problem, and compare their performance with a penalty function algorithm. The numerical results illustrates that the primal-dual achieves better efficiency and low variance across different trials compared to the penalty function method.


page 1

page 2

page 3

page 4


Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

Consider the problem of approximating the optimal policy of a Markov dec...

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

We study the estimation of risk-sensitive policies in reinforcement lear...

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

We propose a new stochastic primal-dual optimization algorithm for plann...

Optimizing over a Restricted Policy Class in Markov Decision Processes

We address the problem of finding an optimal policy in a Markov decision...

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...

A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

The canonical solution methodology for finite constrained Markov decisio...

Primal-dual regression approach for Markov decision processes with general state and action space

We develop a regression based primal-dual martingale approach for solvin...

Please sign up or login with your details

Forgot password? Click here to reset