
An InformationTheoretic Perspective on Credit Assignment in Reinforcement Learning
How do we formalize the challenge of credit assignment in reinforcement ...
XLVIN: eXecuted Latent Value Iteration Nets
Value Iteration Networks (VINs) have emerged as a popular method to inco...
Graph neural induction of value iteration
Many reinforcement learning tasks can benefit from explicit planning bas...
TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?
We investigate whether Jacobi preconditioning, accounting for the bootst...
Policy Evaluation Networks
Many reinforcement learning algorithms use value functions to guide the ...
Options of Interest: Temporal Abstraction with Interest Functions
Temporal abstraction refers to the ability of an agent to use behaviours...
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
The policy gradient theorem is defined based on an objective with respec...
AllAction Policy Gradient Methods: A Numerical Integration Approach
While often stated as an instance of the likelihood ratio trick [Rubinst...
Understanding the Curse of Horizon in OffPolicy Evaluation via Conditional Importance Sampling
We establish a connection between the importance sampling estimators typ...
The Barbados 2018 List of Open Issues in Continual Learning
We want to make progress toward artificial general intelligence, namely ...
Learning Robust Options
Robust reinforcement learning aims to produce policies that have strong ...
Learnings Options EndtoEnd for Continuous Action Tasks
We present new results on learning temporally extended actions for conti...
Learning with Options that Terminate OffPolicy
A temporally abstract action, or an option, is specified by a policy and...
When Waiting is not an Option : Learning Options with a Deliberation Cost
Recent work has shown that temporally extended actions (options) can be ...
Convergent TreeBackup and Retrace with Function Approximation
Offpolicy learning is key to scaling up reinforcement learning as it al...
A Matrix Splitting Perspective on Planning with Options
We show that the Bellman operator underlying the options framework leads...
The OptionCritic Architecture
Temporal abstraction is key to scaling up learning and planning in reinf...
