
PolicyGradient Algorithms Have No Guarantees of Convergence in Continuous Action and State MultiAgent Settings
We show by counterexample that policygradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multiagent settings. To do so, we analyze gradientplay in Nplayer generalsum linear quadratic games. In such games the state and action spaces are continuous and the unique global Nash equilibrium can be found be solving coupled Ricatti equations. Further, gradientplay in LQ games is equivalent to multiagent policy gradient. We first prove that the only critical point of the gradient dynamics in these games is the unique global Nash equilibrium. We then give sufficient conditions under which policy gradient will avoid the Nash equilibrium, and generate a large number of generalsum linear quadratic games that satisfy these conditions. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no guarantee of convergence in multiagent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.
07/08/2019 ∙ by Eric Mazumdar, et al. ∙ 1 ∙ shareread it

Inverse RiskSensitive Reinforcement Learning
We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risksensitive. In particular, we model risksensitivity in a reinforcement learning framework by making use of models of human decisionmaking having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradientbased inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior. We demonstrate the performance of the proposed technique on two examples, the first of which is the canonical Grid World example and the second of which is a Markov decision process modeling passengers' decisions regarding ridesharing. In the latter, we use pricing and travel time data from a ridesharing company to construct the transition probabilities and rewards of the Markov decision process.
03/29/2017 ∙ by Lillian J. Ratliff, et al. ∙ 0 ∙ shareread it

On the Convergence of Competitive, MultiAgent GradientBased Learning
As learning algorithms are increasingly deployed in markets and other competitive environments, understanding their dynamics is becoming increasingly important. We study the limiting behavior of competitive agents employing gradientbased learning algorithms. Specifically, we introduce a general framework for competitive gradientbased learning that encompasses a wide breadth of learning algorithms including policy gradient reinforcement learning, gradient based bandits, and certain online convex optimization algorithms. We show that unlike the single agent case, gradient learning schemes in competitive settings do not necessarily correspond to gradient flows and, hence, it is possible for limiting behaviors like periodic orbits to exist. We introduce a new class of games, MorseSmale games, that correspond to gradientlike flows. We provide guarantees that competitive gradientbased learning algorithms (both in the full information and gradientfree settings) avoid linearly unstable critical points (i.e. strict saddle points and unstable limit cycles). Since generic local Nash equilibria are not unstable critical pointsthat is, in a formal mathematical sense, almost all Nash equilibria are not strict saddlesthese results imply that gradientbased learning almost surely does not get stuck at critical points that do not correspond to Nash equilibria. For MorseSmale games, we show that competitive gradient learning converges to linearly stable cycles (which includes stable Nash equilibria) almost surely. Finally, we specialize these results to commonly used multiagent learning algorithms and provide illustrative examples that demonstrate the wide range of limiting behaviors competitive gradient learning exhibits.
04/16/2018 ∙ by Eric Mazumdar, et al. ∙ 0 ∙ shareread it

Convergence Analysis of GradientBased Learning with NonUniform Learning Rates in NonCooperative MultiAgent Settings
Considering a class of gradientbased multiagent learning algorithms in noncooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Utilizing the minimum and maximum singular values of the game Jacobian, we provide finitetime convergence guarantees in the deterministic case. On the other hand, in the stochastic case, we provide concentration bounds guaranteeing that with high probability agents will converge to a neighborhood of a stable local Nash equilibrium in finite time. Different than other works in this vein, we also study the effects of nonuniform learning rates on the learning dynamics and convergence rates. We find that much like preconditioning in optimization, nonuniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction. The analysis is supported by numerical examples that illustrate different aspects of the theory. We conclude with discussion of the results and open questions.
05/30/2019 ∙ by Benjamin Chasnov, et al. ∙ 0 ∙ shareread it
Eric Mazumdar
is this you? claim profile