
-
Expert Selection in High-Dimensional Markov Decision Processes
In this work we present a multi-armed bandit framework for online expert...
read it
-
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning
This paper proposes a framework for adaptively learning a feedback linea...
read it
-
On Thompson Sampling with Langevin Algorithms
Thompson sampling is a methodology for multi-armed bandit problems that ...
read it
-
Local Nash Equilibria are Isolated, Strict Local Nash Equilibria in `Almost All' Zero-Sum Continuous Games
We prove that differential Nash equilibria are generic amongst local Nas...
read it
-
Feedback Linearization for Unknown Systems via Reinforcement Learning
We present a novel approach to control design for nonlinear systems, whi...
read it
-
Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings
We show by counterexample that policy-gradient algorithms have no guaran...
read it
-
Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings
Considering a class of gradient-based multi-agent learning algorithms in...
read it
-
On the Convergence of Competitive, Multi-Agent Gradient-Based Learning
As learning algorithms are increasingly deployed in markets and other co...
read it
-
Inverse Risk-Sensitive Reinforcement Learning
We address the problem of inverse reinforcement learning in Markov decis...
read it