Eric Mazumdar

is this you? claim profile


  • Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings

    We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games. In such games the state and action spaces are continuous and the unique global Nash equilibrium can be found be solving coupled Ricatti equations. Further, gradient-play in LQ games is equivalent to multi-agent policy gradient. We first prove that the only critical point of the gradient dynamics in these games is the unique global Nash equilibrium. We then give sufficient conditions under which policy gradient will avoid the Nash equilibrium, and generate a large number of general-sum linear quadratic games that satisfy these conditions. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no guarantee of convergence in multi-agent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.

    07/08/2019 ∙ by Eric Mazumdar, et al. ∙ 1 share

    read it

  • Inverse Risk-Sensitive Reinforcement Learning

    We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human decision-making having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradient-based inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior. We demonstrate the performance of the proposed technique on two examples, the first of which is the canonical Grid World example and the second of which is a Markov decision process modeling passengers' decisions regarding ride-sharing. In the latter, we use pricing and travel time data from a ride-sharing company to construct the transition probabilities and rewards of the Markov decision process.

    03/29/2017 ∙ by Lillian J. Ratliff, et al. ∙ 0 share

    read it

  • On the Convergence of Competitive, Multi-Agent Gradient-Based Learning

    As learning algorithms are increasingly deployed in markets and other competitive environments, understanding their dynamics is becoming increasingly important. We study the limiting behavior of competitive agents employing gradient-based learning algorithms. Specifically, we introduce a general framework for competitive gradient-based learning that encompasses a wide breadth of learning algorithms including policy gradient reinforcement learning, gradient based bandits, and certain online convex optimization algorithms. We show that unlike the single agent case, gradient learning schemes in competitive settings do not necessarily correspond to gradient flows and, hence, it is possible for limiting behaviors like periodic orbits to exist. We introduce a new class of games, Morse-Smale games, that correspond to gradient-like flows. We provide guarantees that competitive gradient-based learning algorithms (both in the full information and gradient-free settings) avoid linearly unstable critical points (i.e. strict saddle points and unstable limit cycles). Since generic local Nash equilibria are not unstable critical points---that is, in a formal mathematical sense, almost all Nash equilibria are not strict saddles---these results imply that gradient-based learning almost surely does not get stuck at critical points that do not correspond to Nash equilibria. For Morse-Smale games, we show that competitive gradient learning converges to linearly stable cycles (which includes stable Nash equilibria) almost surely. Finally, we specialize these results to commonly used multi-agent learning algorithms and provide illustrative examples that demonstrate the wide range of limiting behaviors competitive gradient learning exhibits.

    04/16/2018 ∙ by Eric Mazumdar, et al. ∙ 0 share

    read it

  • Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings

    Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Utilizing the minimum and maximum singular values of the game Jacobian, we provide finite-time convergence guarantees in the deterministic case. On the other hand, in the stochastic case, we provide concentration bounds guaranteeing that with high probability agents will converge to a neighborhood of a stable local Nash equilibrium in finite time. Different than other works in this vein, we also study the effects of non-uniform learning rates on the learning dynamics and convergence rates. We find that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction. The analysis is supported by numerical examples that illustrate different aspects of the theory. We conclude with discussion of the results and open questions.

    05/30/2019 ∙ by Benjamin Chasnov, et al. ∙ 0 share

    read it