
Randomized Exploration for Reinforcement Learning with General Value Function Approximation
We propose a modelfree reinforcement learning algorithm inspired by the...
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Marginalized importance sampling (MIS), which measures the density ratio...
Preferential Temporal Difference Learning
TemporalDifference (TD) learning is a general and very useful tool for ...
Flow Network based Generative Models for NonIterative Diverse Candidate Generation
This paper is about the problem of learning a stochastic policy for gene...
Correcting Momentum in Temporal Difference Learning
A common optimization tool used in deep reinforcement learning is moment...
A ConsciousnessInspired Planning Agent for ModelBased Reinforcement Learning
We present an endtoend, modelbased deep reinforcement learning agent ...
Improving LongTerm Metrics in Recommendation Systems using ShortHorizon Offline RL
We study sessionbased recommendation scenarios where we want to recomme...
AndroidEnv: A Reinforcement Learning Platform for Android
We introduce AndroidEnv, an opensource platform for Reinforcement Learn...
What is Going on Inside Recurrent Meta Reinforcement Learning Agents?
Recurrent meta reinforcement learning (metaRL) agents are agents that e...
Training a FirstOrder Theorem Prover from Synthetic Data
A major challenge in applying machine learning to automated theorem prov...
Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards
A major challenge in reinforcement learning is the design of exploration...
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different form...
Gradient Starvation: A Learning Proclivity in Neural Networks
We identify and formalize a fundamental gradient descent phenomenon resu...
DiversityEnriched OptionCritic
Temporal abstraction allows reinforcement learning agents to represent k...
A Study of Policy Gradient on a Class of Exactly Solvable Models
Policy gradient methods are extensively used in reinforcement learning a...
Forethought and Hindsight in Credit Assignment
We address the problem of credit assignment in reinforcement learning an...
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning
In this paper, we present connections between three models used in diffe...
A Fully Tensorized Recurrent Neural Network
Recurrent neural networks (RNNs) are powerful tools for sequential model...
Reward Propagation Using Graph Convolutional Networks
Potentialbased reward shaping provides an approach for designing good r...
Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks
The core operation of Graph Neural Networks (GNNs) is the aggregation en...
Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks
The performance limit of Graph Convolutional Networks (GCNs) and the fac...
An Equivalence between Loss Functions and NonUniform Sampling in Experience Replay
Prioritized Experience Replay (PER) is a deep reinforcement learning tec...
What can I do here? A Theory of Affordances in Reinforcement Learning
Reinforcement learning algorithms usually assume that all actions are al...
Learning to Prove from Synthetic Theorems
A major challenge in applying machine learning to automated theorem prov...
A Brief Look at Generalization in Visual MetaReinforcement Learning
Due to the realization that deep reinforcement learning algorithms train...
Learning to cooperate: Emergent communication in multiagent navigation
Emergent communication in artificial agents has been studied to understa...
A Distributional Analysis of SamplingBased Reinforcement Learning Algorithms
We present a distributional approach to theoretical analyses of reinforc...
Interference and Generalization in Temporal Difference Learning
We study the link between generalization and interference in temporaldi...
Invariant Causal Prediction for Block MDPs
Generalization across environments is critical to the successful applica...
Policy Evaluation Networks
Many reinforcement learning algorithms use value functions to guide the ...
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions
Explicit engineering of reward functions for given environments has been...
Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
Provably efficient reconstruction of policy networks
Recent research has shown that learning policies parametrized by large ...
Options of Interest: Temporal Abstraction with Interest Functions
Temporal abstraction refers to the ability of an agent to use behaviours...
Shaping representations through communication: community size effect in artificial learning systems
Motivated by theories of language and communication that explain why com...
Marginalized State Distribution Entropy Regularization in Policy Optimization
Entropy regularization is used to get improved optimization performance ...
Doubly Robust OffPolicy ActorCritic Algorithms for Reinforcement Learning
We study the problem of offpolicy critic evaluation in several variants...
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
The policy gradient theorem is defined based on an objective with respec...
Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
Optioncritic in cooperative multiagent systems
In this paper, we investigate learning temporal abstractions in cooperat...
Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction
Textbased games are a natural challenge domain for deep reinforcement l...
Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning
Learning and planning in partiallyobservable domains is one of the most...
Navigation Agents for the Visually Impaired: A Sidewalk Simulator and Experiments
Millions of blind and visuallyimpaired (BVI) people navigate urban envi...
Actor Critic with Differentially Private Critic
Reinforcement learning algorithms are known to be sample inefficient, an...
Augmenting learning using symmetry in a biologicallyinspired domain
Invariances to translation, rotation and other spatial transformations a...
Avoidance Learning Using Observational Reinforcement Learning
Imitation learning seeks to learn an expert policy from sampled demonstr...
Revisit Policy Optimization in Matrix Form
In tabular case, when the reward and environment dynamics are known, pol...
An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation
Batch normalization has been widely used to improve optimization in deep...
Selfsupervised Learning of Distance Functions for GoalConditioned Reinforcement Learning
Goalconditioned policies are used in order to break down complex reinfo...
Neural Transfer Learning for Crybased Diagnosis of Perinatal Asphyxia
Despite continuing medical advances, the rate of newborn morbidity and m...
Doina Precup
Associate Professor School of Computer Science at McGill University