Large language models (LLMs), such as GPT-4, have shown remarkable
perfo...
Existing online learning algorithms for adversarial Markov Decision Proc...
Regret Matching+ (RM+) and its variants are important algorithms for sol...
We revisit the problem of learning in two-player zero-sum Markov games,
...
We study the problem of designing adaptive multi-armed bandit algorithms...
Reinforcement Learning (RL) with constraints is becoming an increasingly...
Supply chain management (SCM) has been recognized as an important discip...
We study high-probability regret bounds for adversarial K-armed bandits
...
A recent paper by Piliouras et al. [2021, 2022] introduces an uncoupled
...
A recent line of work has established uncoupled learning dynamics such t...
We consider regret minimization for Adversarial Markov Decision Processe...
We initiate the study of dynamic regret minimization for goal-oriented
r...
In this paper we establish efficient and uncoupled learning dynamics
so ...
We consider the problem of combining and learning over a set of adversar...
We consider the problem of adversarial bandit convex optimization, that ...
Policy optimization is among the most popular and successful reinforceme...
While extensive-form games (EFGs) can be converted into normal-form game...
We study regret minimization for infinite-horizon average-reward Markov
...
The standard assumption in reinforcement learning (RL) is that agents ob...
Learning from repeated play in a fixed two-player zero-sum game is a cla...
We introduce two new no-regret algorithms for the stochastic shortest pa...
Policy optimization is a widely-used method in reinforcement learning. D...
Regret-based algorithms are highly efficient at finding approximate Nash...
We introduce a generic template for developing regret minimization algor...
We consider the problem of online reinforcement learning for the Stochas...
We consider the best-of-both-worlds problem for learning an episodic Mar...
In this work, we develop linear bandit algorithms that automatically ada...
We propose a black-box reduction that turns a certain reinforcement lear...
We make significant progress toward the stochastic shortest path problem...
We study infinite-horizon discounted two-player zero-sum Markov games, a...
We resolve the long-standing "impossible tuning" issue for the classic e...
We study the stochastic shortest path problem with adversarial costs and...
We develop several new algorithms for learning Markov Decision Processes...
We study bandit convex optimization methods that adapt to the norm of th...
Online machine learning systems need to adapt to domain shifts. Meanwhil...
In statistical learning, algorithms for model selection allow the learne...
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddle-point
opt...
We develop a new approach to obtaining high probability regret bounds fo...
This work studies the problem of learning episodic Markov Decision Proce...
Recently, model-free reinforcement learning has attracted research atten...
We revisit the problem of online learning with sleeping experts/bandits:...
We initiate the study of learning in contextual bandits with the help of...
We study small-loss bounds for the adversarial multi-armed bandits probl...
When an AI system interacts with multiple users, it frequently needs to ...
We consider the problem of learning in episodic finite-horizon Markov
de...
Model-free reinforcement learning is known to be memory and computation
...
We introduce the problem of model selection for contextual bandits, wher...
We propose the first reduction-based approach to obtaining long-term mem...
We present an extensive study of generalization for data-dependent hypot...
We propose the first contextual bandit algorithm that is parameter-free,...