
An InformationTheoretic Perspective on Credit Assignment in Reinforcement Learning
How do we formalize the challenge of credit assignment in reinforcement ...
read it

XLVIN: eXecuted Latent Value Iteration Nets
Value Iteration Networks (VINs) have emerged as a popular method to inco...
read it

Graph neural induction of value iteration
Many reinforcement learning tasks can benefit from explicit planning bas...
read it

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?
We investigate whether Jacobi preconditioning, accounting for the bootst...
read it

Policy Evaluation Networks
Many reinforcement learning algorithms use value functions to guide the ...
read it

Options of Interest: Temporal Abstraction with Interest Functions
Temporal abstraction refers to the ability of an agent to use behaviours...
read it

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
The policy gradient theorem is defined based on an objective with respec...
read it

AllAction Policy Gradient Methods: A Numerical Integration Approach
While often stated as an instance of the likelihood ratio trick [Rubinst...
read it

Understanding the Curse of Horizon in OffPolicy Evaluation via Conditional Importance Sampling
We establish a connection between the importance sampling estimators typ...
read it

The Barbados 2018 List of Open Issues in Continual Learning
We want to make progress toward artificial general intelligence, namely ...
read it

Learning Robust Options
Robust reinforcement learning aims to produce policies that have strong ...
read it

Learnings Options EndtoEnd for Continuous Action Tasks
We present new results on learning temporally extended actions for conti...
read it

Learning with Options that Terminate OffPolicy
A temporally abstract action, or an option, is specified by a policy and...
read it

When Waiting is not an Option : Learning Options with a Deliberation Cost
Recent work has shown that temporally extended actions (options) can be ...
read it

Convergent TreeBackup and Retrace with Function Approximation
Offpolicy learning is key to scaling up reinforcement learning as it al...
read it

A Matrix Splitting Perspective on Planning with Options
We show that the Bellman operator underlying the options framework leads...
read it

The OptionCritic Architecture
Temporal abstraction is key to scaling up learning and planning in reinf...
read it
PierreLuc Bacon
is this you? claim profile