Emma Brunskill

research

∙ 07/05/2023

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Simple regret minimization is a critical problem in learning optimal tre...

0 Sanath Kumar Krishnamurthy, et al. ∙

research

∙ 06/26/2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Large transformer models trained on diverse datasets have shown a remark...

0 Jonathan N. Lee, et al. ∙

research

∙ 06/24/2023

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Despite the recent advancements in offline reinforcement learning via su...

0 Anirudhan Badrinath, et al. ∙

research

∙ 06/21/2023

Automated Reminders Reduce Incarceration for Missed Court Dates: Evidence from a Text Message Experiment

Millions of Americans must attend mandatory court dates every year. To b...

0 Alex Chohlas-Wood, et al. ∙

research

∙ 04/11/2023

Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task

Resource limitations make it hard to provide all students with one of th...

0 Sherry Ruan, et al. ∙

research

∙ 02/19/2023

Estimating Optimal Policy Value in General Linear Contextual Bandits

In many bandit problems, the maximal reward achievable by a policy is of...

0 Jonathan N. Lee, et al. ∙

research

∙ 01/26/2023

Model-based Offline Reinforcement Learning with Local Misspecification

We present a model-based offline reinforcement learning policy performan...

0 Kefan Dong, et al. ∙

research

∙ 11/16/2022

Giving Feedback on Interactive Student Programs with Meta-Exploration

Developing interactive software, such as websites or games, is a particu...

0 Evan Zheran Liu, et al. ∙

research

∙ 11/03/2022

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

In offline reinforcement learning (RL), a learner leverages prior logged...

3 Jonathan N. Lee, et al. ∙

research

∙ 10/16/2022

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Offline reinforcement learning (RL) can be used to improve future perfor...

0 Allen Nie, et al. ∙

research

∙ 07/01/2022

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world...

0 Yao Liu, et al. ∙

research

∙ 12/30/2021

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

Online reinforcement learning (RL) algorithms are often difficult to dep...

0 Tong Mu, et al. ∙

research

∙ 11/28/2021

Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

Off-policy policy evaluation methods for sequential decision making can ...

0 Ramtin Keramati, et al. ∙

research

∙ 11/15/2021

Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

There are a number of available methods that can be used for choosing wh...

0 Steve Yadlowsky, et al. ∙

research

∙ 10/27/2021

Play to Grade: Testing Coding Games as Classifying Markov Decision Process

Contemporary coding education often presents students with the task of d...

0 Allen Nie, et al. ∙

research

∙ 09/18/2021

Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making

In the dominant paradigm for designing equitable machine learning system...

0 Alex Chohlas-Wood, et al. ∙

research

∙ 08/19/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning p...

0 Andrea Zanette, et al. ∙

research

∙ 07/21/2021

Design of Experiments for Stochastic Contextual Linear Bandits

In the stochastic linear contextual bandit setting there exist several m...

0 Andrea Zanette, et al. ∙

research

∙ 04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...

0 Yash Chandak, et al. ∙

research

∙ 11/19/2020

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often ...

0 Jonathan N. Lee, et al. ∙

research

∙ 08/18/2020

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

There has been growing progress on theoretical analyses for provably eff...

2 Andrea Zanette, et al. ∙

research

∙ 07/16/2020

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to...

11 Yao Liu, et al. ∙

research

∙ 07/12/2020

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

Model-based reinforcement learning (RL) is appealing because (i) it enab...

12 Evan Zheran Liu, et al. ∙

research

∙ 04/13/2020

Power-Constrained Bandits

Contextual bandits often provide simple and effective personalization in...

1 Jiayu Yao, et al. ∙

research

∙ 04/02/2020

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Interactive adaptive systems powered by Reinforcement Learning (RL) have...

5 Ramtin Keramati, et al. ∙

research

∙ 03/12/2020

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

When observed decisions depend only on observed features, off-policy pol...

5 Hongseok Namkoong, et al. ∙

research

∙ 02/29/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value fu...

15 Andrea Zanette, et al. ∙

research

∙ 02/10/2020

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of usi...

5 Omer Gottesman, et al. ∙

research

∙ 01/31/2020

Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning

Accurate reporting of energy and carbon usage is essential for understan...

0 Peter Henderson, et al. ∙

research

∙ 12/12/2019

Sublinear Optimal Policy Value Estimation in Contextual Bandits

We study the problem of estimating the expected reward of the optimal po...

0 Weihao Kong, et al. ∙

research

∙ 11/16/2019

Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare

There is an emerging trend in the reinforcement learning for healthcare ...

0 Scott L. Fleming, et al. ∙

research

∙ 11/05/2019

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

While maximizing expected return is the goal in most reinforcement learn...

0 Ramtin Keramati, et al. ∙

research

∙ 11/03/2019

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn fro...

0 Andrea Zanette, et al. ∙

research

∙ 10/15/2019

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

We establish a connection between the importance sampling estimators typ...

0 Yao Liu, et al. ∙

research

∙ 06/18/2019

Directed Exploration for Reinforcement Learning

Efficient exploration is necessary to achieve good sample efficiency for...

0 Zhaohan Daniel Guo, et al. ∙

research

∙ 05/23/2019

Learning When-to-Treat Policies

Many applied decision-making problems have a dynamic component: The poli...

0 Xinkun Nie, et al. ∙

research

∙ 05/14/2019

Combining Parametric and Nonparametric Models for Off-Policy Evaluation

We consider a model-based approach to perform batch off-policy evaluatio...

0 Omer Gottesman, et al. ∙

research

∙ 04/17/2019

PLOTS: Procedure Learning from Observations using Subtask Structure

In many cases an intelligent agent may want to learn how to mimic a sing...

0 Tong Mu, et al. ∙

research

∙ 04/17/2019

Off-Policy Policy Gradient with State Distribution Correction

We study the problem of off-policy policy optimization in Markov decisio...

0 Yao Liu, et al. ∙

research

∙ 02/05/2019

Separating value functions across time-scales

In many finite horizon episodic reinforcement learning (RL) settings, it...

0 Joshua Romoff, et al. ∙

research

∙ 01/01/2019

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

Strong worst-case performance bounds for episodic reinforcement learning...

0 Andrea Zanette, et al. ∙

research

∙ 12/03/2018

Distilling Information from a Flood: A Possibility for the Use of Meta-Analysis and Systematic Review in Machine Learning Research

The current flood of information in all areas of machine learning resear...

0 Peter Henderson, et al. ∙

research

∙ 11/07/2018

Policy Certificates: Towards Accountable Reinforcement Learning

The performance of a reinforcement learning algorithm can vary drastical...

0 Christoph Dann, et al. ∙

research

∙ 07/03/2018

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

In this work, we consider the problem of estimating a behaviour policy f...

0 Aniruddh Raghu, et al. ∙

research

∙ 06/15/2018

Sample-Efficient Deep RL with Generative Adversarial Tree Search

We propose Generative Adversarial Tree Search (GATS), a sample-efficient...

4 Kamyar Azizzadenesheli, et al. ∙

research

∙ 06/01/2018

Strategic Object Oriented Reinforcement Learning

Humans learn to play video games significantly faster than state-of-the-...

0 Ramtin Keramati, et al. ∙

research

∙ 05/23/2018

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

Efficient exploration is one of the key challenges for reinforcement lea...

0 Yao Liu, et al. ∙

research

∙ 05/23/2018

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...

0 Yao Liu, et al. ∙

research

∙ 02/13/2018

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling...

0 Kamyar Azizzadenesheli, et al. ∙

research

∙ 11/29/2017

Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands

Many task domains require robots to interpret and act upon natural langu...

1 Thomas Kollar, et al. ∙

Emma Brunskill

Featured Co-authors

Sign in with Google

Consider DeepAI Pro