
Minimax Regret for Stochastic Shortest Path
We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
RL for Latent MDPs: Regret Guarantees and a Lower Bound
In this work, we consider the regret minimization problem for reinforcem...
Reinforcement Learning with Trajectory Feedback
The computational model of reinforcement learning is based upon the abil...
Bandits with Partially Observable Offline Data
We study linear contextual bandits with access to a large, partially obs...
Mirror Descent Policy Optimization
We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...
ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of R...
Multistep Greedy Policies in ModelFree Deep Reinforcement Learning
Multistep greedy policies have been extensively used in modelbased Rei...
MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Trust region policy optimization (TRPO) is a popular and empirically suc...
Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
Action Robust Reinforcement Learning and Applications in Continuous Control
A policy is said to be robust if it maximizes the reward while consideri...
Revisiting ExplorationConscious Reinforcement Learning
The objective of Reinforcement Learning is to learn an optimal policy by...
How to Combine TreeSearch Methods in Reinforcement Learning
Finitehorizon lookahead policies are abundantly used in Reinforcement L...
MultipleStep Greedy Policies in Online and Approximate Reinforcement Learning
Multiplestep lookahead policies have demonstrated high empirical compet...
Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvem...
