
Minimax Regret for Stochastic Shortest Path
We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
read it

RL for Latent MDPs: Regret Guarantees and a Lower Bound
In this work, we consider the regret minimization problem for reinforcem...
read it

Reinforcement Learning with Trajectory Feedback
The computational model of reinforcement learning is based upon the abil...
read it

Bandits with Partially Observable Offline Data
We study linear contextual bandits with access to a large, partially obs...
read it

Mirror Descent Policy Optimization
We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...
read it

ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
read it

Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of R...
read it

Multistep Greedy Policies in ModelFree Deep Reinforcement Learning
Multistep greedy policies have been extensively used in modelbased Rei...
read it

MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
read it

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Trust region policy optimization (TRPO) is a popular and empirically suc...
read it

Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
read it

Action Robust Reinforcement Learning and Applications in Continuous Control
A policy is said to be robust if it maximizes the reward while consideri...
read it

Revisiting ExplorationConscious Reinforcement Learning
The objective of Reinforcement Learning is to learn an optimal policy by...
read it

How to Combine TreeSearch Methods in Reinforcement Learning
Finitehorizon lookahead policies are abundantly used in Reinforcement L...
read it

MultipleStep Greedy Policies in Online and Approximate Reinforcement Learning
Multiplestep lookahead policies have demonstrated high empirical compet...
read it

Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvem...
read it
Yonathan Efroni
is this you? claim profile