
-
Reinforcement Learning with Trajectory Feedback
The computational model of reinforcement learning is based upon the abil...
read it
-
Bandits with Partially Observable Offline Data
We study linear contextual bandits with access to a large, partially obs...
read it
-
Mirror Descent Policy Optimization
We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...
read it
-
Exploration-Exploitation in Constrained MDPs
In many sequential decision-making problems, the goal is to optimize a u...
read it
-
Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of R...
read it
-
Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning
Multi-step greedy policies have been extensively used in model-based Rei...
read it
-
Multi-Step Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming...
read it
-
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Trust region policy optimization (TRPO) is a popular and empirically suc...
read it
-
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
State-of-the-art efficient model-based Reinforcement Learning (RL) algor...
read it
-
Action Robust Reinforcement Learning and Applications in Continuous Control
A policy is said to be robust if it maximizes the reward while consideri...
read it
-
Revisiting Exploration-Conscious Reinforcement Learning
The objective of Reinforcement Learning is to learn an optimal policy by...
read it
-
How to Combine Tree-Search Methods in Reinforcement Learning
Finite-horizon lookahead policies are abundantly used in Reinforcement L...
read it
-
Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning
Multiple-step lookahead policies have demonstrated high empirical compet...
read it
-
Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvem...
read it