Martha White

research

∙ 07/10/2023

Measuring and Mitigating Interference in Reinforcement Learning

Catastrophic interference is common in many network-based learning syste...

0 Vincent Liu, et al. ∙

research

∙ 05/16/2023

Coagent Networks: Generalized and Scaled

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011...

0 James E. Kostas, et al. ∙

research

∙ 04/03/2023

Empirical Design in Reinforcement Learning

Empirical design in reinforcement learning is no small task. Running goo...

0 Andrew Patterson, et al. ∙

research

∙ 02/28/2023

The In-Sample Softmax for Offline Reinforcement Learning

Reinforcement learning (RL) agents can leverage batches of previously co...

0 Chenjun Xiao, et al. ∙

research

∙ 02/23/2023

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

In this work, we consider the off-policy policy evaluation problem for c...

1 Vincent Liu, et al. ∙

research

∙ 01/27/2023

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

Many policy optimization approaches in reinforcement learning incorporat...

0 Lingwei Zhu, et al. ∙

research

∙ 01/26/2023

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

Off-policy learning from multistep returns is crucial for sample-efficie...

0 Brett Daley, et al. ∙

research

∙ 06/06/2022

Goal-Space Planning with Subgoal Models

This paper investigates a new approach to model-based reinforcement lear...

11 Chunlok Lo, et al. ∙

research

∙ 05/18/2022

No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

The performance of reinforcement learning (RL) agents is sensitive to th...

0 Han Wang, et al. ∙

research

∙ 05/17/2022

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are ba...

0 Andrew Patterson, et al. ∙

research

∙ 03/30/2022

Investigating the Properties of Neural Network Representations in Reinforcement Learning

In this paper we investigate the properties of representations learned b...

26 Han Wang, et al. ∙

research

∙ 03/22/2022

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

Most convergence guarantees for stochastic gradient descent with momentu...

4 Kirby Banman, et al. ∙

research

∙ 02/22/2022

Continual Auxiliary Task Learning

Learning auxiliary tasks, such as multiple predictions about the world, ...

0 Matthew McLeod, et al. ∙

research

∙ 02/04/2022

A Temporal-Difference Approach to Policy Gradient Estimation

The policy gradient theorem (Sutton et al., 2000) prescribes the usage o...

6 Samuele Tosatto, et al. ∙

research

∙ 12/22/2021

An Alternate Policy Gradient Estimator for Softmax Policies

Policy gradient (PG) estimators for softmax policies are ineffective wit...

6 Shivam Garg, et al. ∙

research

∙ 11/16/2021

Off-Policy Actor-Critic with Emphatic Weightings

A variety of theoretically-sound policy gradient algorithms exist for th...

0 Eric Graves, et al. ∙

research

∙ 11/15/2021

Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning

Offline reinforcement learning-learning a policy from a batch of data-is...

0 Vincent Liu, et al. ∙

research

∙ 07/17/2021

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Approximate Policy Iteration (API) algorithms alternate between (approxi...

0 Alan Chan, et al. ∙

research

∙ 05/29/2021

Predictive Representation Learning for Language Modeling

To effectively perform the task of next-word prediction, long short-term...

0 Qingfeng Lan, et al. ∙

research

∙ 04/28/2021

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation. However...

0 Andrew Patterson, et al. ∙

research

∙ 03/09/2021

Scalable Online Recurrent Learning Using Columnar Neural Networks

Structural credit assignment for recurrent learning is challenging. An a...

0 Khurram Javed, et al. ∙

research

∙ 12/07/2020

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

This report presents the debates, posters, and discussions of the Sim2Re...

4 Sebastian Höfer, et al. ∙

research

∙ 10/23/2020

Towards Safe Policy Improvement for Non-Stationary MDPs

Many real-world sequential decision-making problems involve critical sys...

0 Yash Chandak, et al. ∙

research

∙ 10/14/2020

From Language to Language-ish: How Brain-Like is an LSTM's Representation of Nonsensical Language Stimuli?

The representations generated by many models of language (word embedding...

0 Maryam Hashemzadeh, et al. ∙

research

∙ 07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...

12 Jincheng Mei, et al. ∙

research

∙ 07/07/2020

Towards a practical measure of interference for reinforcement learning

Catastrophic interference is common in many network-based learning syste...

28 Vincent Liu, et al. ∙

research

∙ 07/05/2020

Selective Dyna-style Planning Under Limited Model Capacity

In model-based reinforcement learning, planning with an imperfect model ...

3 Muhammad Zaheer, et al. ∙

research

∙ 07/01/2020

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learni...

3 Sina Ghiassian, et al. ∙

research

∙ 06/12/2020

Learning Causal Models Online

Predictive models – learned from observational data not covering the com...

11 Khurram Javed, et al. ∙

research

∙ 06/08/2020

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

Dyna-style reinforcement learning (RL) agents improve sample efficiency ...

3 Taher Jafferjee, et al. ∙

research

∙ 05/17/2020

Optimizing for the Future in Non-Stationary MDPs

Most reinforcement learning methods are based upon the key assumption th...

2 Yash Chandak, et al. ∙

research

∙ 05/11/2020

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

Information gathering in a partially observable environment can be formu...

32 Yash Satsangi, et al. ∙

research

∙ 02/16/2020

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Q-learning suffers from overestimation bias, because it approximates the...

29 Qingfeng Lan, et al. ∙

research

∙ 02/14/2020

An implicit function learning approach for parametric modal regression

For multi-valued functions—such as when the conditional distribution on ...

13 Yangchen Pan, et al. ∙

research

∙ 10/03/2019

Is Fast Adaptation All You Need?

Gradient-based meta-learning has proven to be highly effective at learni...

24 Khurram Javed, et al. ∙

research

∙ 07/17/2019

Meta-descent for Online, Continual Prediction

This paper investigates different vector step-size adaptation approaches...

0 Andrew Jacobsen, et al. ∙

research

∙ 06/19/2019

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

Learning about many things can provide numerous benefits to a reinforcem...

2 Cam Linke, et al. ∙

research

∙ 06/18/2019

Hill Climbing on Value Estimates for Search-control in Dyna

Dyna is an architecture for model-based reinforcement learning (RL), whe...

0 Yangchen Pan, et al. ∙

research

∙ 06/11/2019

Importance Resampling for Off-policy Prediction

Importance sampling (IS) is a common reweighting strategy for off-policy...

3 Matthew Schlegel, et al. ∙

research

∙ 05/29/2019

Meta-Learning Representations for Continual Learning

A continual learning agent should be able to build on top of existing kn...

10 Khurram Javed, et al. ∙

research

∙ 04/02/2019

Planning with Expectation Models

Distribution and sample models are two popular model choices in model-ba...

20 Yi Wan, et al. ∙

research

∙ 12/03/2018

Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling

Knowledge distillation is an effective technique that transfers knowledg...

10 Minghan Li, et al. ∙

research

∙ 11/22/2018

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement lea...

0 Ehsan Imani, et al. ∙

research

∙ 11/16/2018

The Barbados 2018 List of Open Issues in Continual Learning

We want to make progress toward artificial general intelligence, namely ...

0 Tom Schaul, et al. ∙

research

∙ 11/15/2018

Context-Dependent Upper-Confidence Bounds for Directed Exploration

Directed exploration strategies for reinforcement learning are critical ...

6 Raksha Kumaraswamy, et al. ∙

research

∙ 11/15/2018

The Utility of Sparse Representations for Control in Reinforcement Learning

We investigate sparse representations for control in reinforcement learn...

4 Vincent Liu, et al. ∙

research

∙ 11/06/2018

Online Off-policy Prediction

This paper investigates the problem of online prediction learning, where...

8 Sina Ghiassian, et al. ∙

research

∙ 10/22/2018

Actor-Expert: A Framework for using Action-Value Methods in Continuous Action Spaces

Value-based approaches can be difficult to use in continuous action spac...

6 Sungsu Lim, et al. ∙

research

∙ 08/28/2018

High-confidence error estimates for learned value functions

Estimating the value function for a fixed policy is a fundamental proble...

2 Touqir Sajed, et al. ∙

research

∙ 07/18/2018

General Value Function Networks

In this paper we show that restricting the representation-layer of a Rec...

0 Matthew Schlegel, et al. ∙

Martha White

Featured Co-authors

Sign in with Google

Consider DeepAI Pro