
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Approximate Policy Iteration (API) algorithms alternate between (approxi...
read it

Predictive Representation Learning for Language Modeling
To effectively perform the task of nextword prediction, long shortterm...
read it

A Generalized Projected Bellman Error for Offpolicy Value Estimation in Reinforcement Learning
Many reinforcement learning algorithms rely on value estimation. However...
read it

Scalable Online Recurrent Learning Using Columnar Neural Networks
Structural credit assignment for recurrent learning is challenging. An a...
read it

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop
This report presents the debates, posters, and discussions of the Sim2Re...
read it

Towards Safe Policy Improvement for NonStationary MDPs
Many realworld sequential decisionmaking problems involve critical sys...
read it

From Language to Languageish: How BrainLike is an LSTM's Representation of Nonsensical Language Stimuli?
The representations generated by many models of language (word embedding...
read it

Beyond Prioritized Replay: Sampling States in ModelBased RL via Simulated Priorities
Modelbased reinforcement learning (MBRL) can significantly improve samp...
read it

Towards a practical measure of interference for reinforcement learning
Catastrophic interference is common in many networkbased learning syste...
read it

Selective Dynastyle Planning Under Limited Model Capacity
In modelbased reinforcement learning, planning with an imperfect model ...
read it

Gradient TemporalDifference Learning with Regularized Corrections
It is still common to use Qlearning and temporal difference (TD) learni...
read it

Learning Causal Models Online
Predictive models – learned from observational data not covering the com...
read it

Hallucinating Value: A Pitfall of Dynastyle Planning with Imperfect Environment Models
Dynastyle reinforcement learning (RL) agents improve sample efficiency ...
read it

Optimizing for the Future in NonStationary MDPs
Most reinforcement learning methods are based upon the key assumption th...
read it

Maximizing Information Gain in Partially Observable Environments via Prediction Reward
Information gathering in a partially observable environment can be formu...
read it

Maxmin Qlearning: Controlling the Estimation Bias of Qlearning
Qlearning suffers from overestimation bias, because it approximates the...
read it

An implicit function learning approach for parametric modal regression
For multivalued functions—such as when the conditional distribution on ...
read it

Is Fast Adaptation All You Need?
Gradientbased metalearning has proven to be highly effective at learni...
read it

Metadescent for Online, Continual Prediction
This paper investigates different vector stepsize adaptation approaches...
read it

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study
Learning about many things can provide numerous benefits to a reinforcem...
read it

Hill Climbing on Value Estimates for Searchcontrol in Dyna
Dyna is an architecture for modelbased reinforcement learning (RL), whe...
read it

Importance Resampling for Offpolicy Prediction
Importance sampling (IS) is a common reweighting strategy for offpolicy...
read it

MetaLearning Representations for Continual Learning
A continual learning agent should be able to build on top of existing kn...
read it

Planning with Expectation Models
Distribution and sample models are two popular model choices in modelba...
read it

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling
Knowledge distillation is an effective technique that transfers knowledg...
read it

An Offpolicy Policy Gradient Theorem Using Emphatic Weightings
Policy gradient methods are widely used for control in reinforcement lea...
read it

The Barbados 2018 List of Open Issues in Continual Learning
We want to make progress toward artificial general intelligence, namely ...
read it

ContextDependent UpperConfidence Bounds for Directed Exploration
Directed exploration strategies for reinforcement learning are critical ...
read it

The Utility of Sparse Representations for Control in Reinforcement Learning
We investigate sparse representations for control in reinforcement learn...
read it

Online Offpolicy Prediction
This paper investigates the problem of online prediction learning, where...
read it

ActorExpert: A Framework for using ActionValue Methods in Continuous Action Spaces
Valuebased approaches can be difficult to use in continuous action spac...
read it

Highconfidence error estimates for learned value functions
Estimating the value function for a fixed policy is a fundamental proble...
read it

General Value Function Networks
In this paper we show that restricting the representationlayer of a Rec...
read it

Reinforcement Learning with FunctionValued Action Spaces for Partial Differential Equation Control
Recent work has shown that reinforcement learning (RL) is a promising ap...
read it

Organizing Experience: A Deeper Look at Replay Mechanisms for Samplebased Planning in Continuous State Domains
Modelbased strategies for control are critical to obtain sample efficie...
read it

Improving Regression Performance with Distributional Losses
There is growing evidence that converting targets to soft targets in sup...
read it

Directly Estimating the Variance of the λReturn Using TemporalDifference Methods
This paper investigates estimating the variance of a temporaldifference...
read it

Learning Sparse Representations in Reinforcement Learning with Sparse Coding
A variety of representation learning approaches have been investigated f...
read it

Recovering True Classifier Performance in PositiveUnlabeled Learning
A common approach in positiveunlabeled learning is to train a classific...
read it

Accelerated Gradient Temporal Difference Learning
The family of temporal difference (TD) methods span a spectrum from comp...
read it

Unifying task specification in reinforcement learning
Reinforcement learning tasks are typically specified as Markov decision ...
read it

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
One of the main obstacles to broad application of reinforcement learning...
read it

Estimating the class prior and posterior from noisy positives and unlabeled data
We develop a classification algorithm for estimating posterior distribut...
read it

Identifying global optimality for dictionary learning
Learning new representations of input observations in machine learning i...
read it

Investigating practical linear temporal difference learning
Offpolicy reinforcement learning has many applications including: learn...
read it

Nonparametric semisupervised learning of class proportions
The problem of developing binary classifiers from positive and unlabeled...
read it

Incremental Truncated LSTD
Balancing between computational efficiency and sample efficiency is an i...
read it

Partition Tree Weighting
This paper introduces the Partition Tree Weighting technique, an efficie...
read it
Martha White
verfied profile
Associate Professor at the University of Alberta