Multi-Agent Fully Decentralized Off-Policy Learning with Linear Convergence Rates

10/17/2018
by   Lucas Cassano, et al.
0

In this paper we develop a fully decentralized algorithm for policy evaluation with off-policy learning, linear function approximation, and O(n) complexity in both computation and memory requirements. The proposed algorithm is of the variance reduced kind and achieves linear convergence. We consider the case where a collection of agents have distinct and fixed size datasets gathered following different behavior policies (none of which is required to explore the full state space) and they all collaborate to evaluate a common target policy. The network approach allows all agents to converge to the optimal solution even in situations where neither agent can converge on its own without cooperation. We provide simulations to illustrate the effectiveness of the method.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset