We study how to learn ϵ-optimal strategies in zero-sum imperfect
informa...
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
In this work, we derive sharp non-asymptotic deviation bounds for weight...
We present a novel, alternative framework for learning generative models...
We consider the reinforcement learning (RL) setting, in which the agent ...
Imperfect information games (IIG) are games in which each player only
pa...
We consider reinforcement learning in an environment modeled by an episo...
In this work, we consider and analyze the sample complexity of model-fre...
We propose the Bayes-UCBVI algorithm for reinforcement learning in tabul...
We consider a multi-armed bandit problem specified by a set of
one-dimen...
We introduce a generic strategy for provably efficient multi-goal
explor...
We investigate the problem dependent regime in the stochastic Thresholdi...
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
We consider a stochastic bandit problem with a possibly infinite number ...
We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...
In this paper, we propose new problem-independent lower bounds on the sa...
Realistic environments often provide agents with very limited feedback. ...
In this work, we propose KeRNS: an algorithm for episodic reinforcement
...
We study a structured variant of the multi-armed bandit problem specifie...
We investigate an active pure-exploration setting, that includes best-ar...
We consider a multi-armed bandit problem specified by a set of Gaussian ...
We investigate the stochastic Thresholding Bandit problem (TBP) under se...
Reward-free exploration is a reinforcement learning setting recently stu...
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algo...
We consider the exploration-exploitation dilemma in finite-horizon
reinf...
We investigate and provide new insights on the sampling rule called Top-...
Pure exploration (aka active testing) is the fundamental task of sequent...
We present a new algorithm based on an gradient ascent for a general Act...
In the context of K-armed stochastic bandits with distribution only assu...
We analyze the sample complexity of the thresholding bandit problem, wit...
We propose the kl-UCB ++ algorithm for regret minimization in stochastic...