
PowerConstrained Bandits
Contextual bandits often provide simple and effective personalization in...
read it

Value Driven Representation for HumanintheLoop Reinforcement Learning
Interactive adaptive systems powered by Reinforcement Learning (RL) have...
read it

Offpolicy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
When observed decisions depend only on observed features, offpolicy pol...
read it

Learning Near Optimal Policies with Low Inherent Bellman Error
We study the exploration problem with approximate linear actionvalue fu...
read it

Interpretable OffPolicy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Offpolicy evaluation in reinforcement learning offers the chance of usi...
read it

Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
Accurate reporting of energy and carbon usage is essential for understan...
read it

Sublinear Optimal Policy Value Estimation in Contextual Bandits
We study the problem of estimating the expected reward of the optimal po...
read it

Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare
There is an emerging trend in the reinforcement learning for healthcare ...
read it

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
While maximizing expected return is the goal in most reinforcement learn...
read it

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs
In order to make good decision under uncertainty an agent must learn fro...
read it

Understanding the Curse of Horizon in OffPolicy Evaluation via Conditional Importance Sampling
We establish a connection between the importance sampling estimators typ...
read it

Directed Exploration for Reinforcement Learning
Efficient exploration is necessary to achieve good sample efficiency for...
read it

Learning WhentoTreat Policies
Many applied decisionmaking problems have a dynamic component: The poli...
read it

Combining Parametric and Nonparametric Models for OffPolicy Evaluation
We consider a modelbased approach to perform batch offpolicy evaluatio...
read it

PLOTS: Procedure Learning from Observations using Subtask Structure
In many cases an intelligent agent may want to learn how to mimic a sing...
read it

OffPolicy Policy Gradient with State Distribution Correction
We study the problem of offpolicy policy optimization in Markov decisio...
read it

Separating value functions across timescales
In many finite horizon episodic reinforcement learning (RL) settings, it...
read it

Tighter ProblemDependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Strong worstcase performance bounds for episodic reinforcement learning...
read it

Distilling Information from a Flood: A Possibility for the Use of MetaAnalysis and Systematic Review in Machine Learning Research
The current flood of information in all areas of machine learning resear...
read it

Policy Certificates: Towards Accountable Reinforcement Learning
The performance of a reinforcement learning algorithm can vary drastical...
read it

Behaviour Policy Estimation in OffPolicy Policy Evaluation: Calibration Matters
In this work, we consider the problem of estimating a behaviour policy f...
read it

SampleEfficient Deep RL with Generative Adversarial Tree Search
We propose Generative Adversarial Tree Search (GATS), a sampleefficient...
read it

Strategic Object Oriented Reinforcement Learning
Humans learn to play video games significantly faster than stateofthe...
read it

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
Efficient exploration is one of the key challenges for reinforcement lea...
read it

Representation Balancing MDPs for OffPolicy Policy Evaluation
We study the problem of offpolicy policy evaluation (OPPE) in RL. In co...
read it

Efficient Exploration through Bayesian Deep QNetworks
We propose Bayesian Deep QNetwork (BDQN), a practical Thompson sampling...
read it

Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands
Many task domains require robots to interpret and act upon natural langu...
read it

On Ensuring that Intelligent Machines Are WellBehaved
Machine learning algorithms are everywhere, ranging from simple data ana...
read it

Policy Gradient Methods for Reinforcement Learning with Function Approximation and ActionDependent Baselines
We show how an actiondependent baseline can be used by the policy gradi...
read it

Decoupling Learning Rules from Representations
In the artificial intelligence field, learning often corresponds to chan...
read it

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Statistical performance bounds for reinforcement learning (RL) algorithm...
read it

Sample Efficient Feature Selection for Factored MDPs
In reinforcement learning, the state of the real world is often represen...
read it

Sample Efficient Policy Search for Optimal Stopping Domains
Optimal stopping problems consider the question of deciding when to stop...
read it

Importance Sampling with Unequal Support
Importance sampling is often used in machine learning when training and ...
read it

A PAC RL Algorithm for Episodic POMDPs
Many interesting real world domains involve reinforcement learning (RL) ...
read it

DataEfficient OffPolicy Policy Evaluation for Reinforcement Learning
In this paper we present a new way of predicting the performance of a re...
read it

Sample Complexity of Episodic FixedHorizon Reinforcement Learning
Recently, there has been significant progress in understanding reinforce...
read it

The Online CouponCollector Problem and Its Application to Lifelong Reinforcement Learning
Transferring knowledge across a sequence of related tasks is an importan...
read it

Online Stochastic Optimization under Correlated Bandit Feedback
In this paper we consider the problem of online stochastic optimization ...
read it

Efficient Planning under Uncertainty with Macroactions
Deciding how to act in partially observable environments remains an acti...
read it

Sample Complexity of Multitask Reinforcement Learning
Transferring knowledge across a sequence of reinforcementlearning tasks...
read it

Sequential Transfer in Multiarmed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve fu...
read it

Regret Bounds for Reinforcement Learning with Policy Advice
In some reinforcement learning problems an agent may be provided with a ...
read it

RAPID: A Reachable Anytime Planner for Impreciselysensed Domains
Despite the intractability of generic optimal partially observable Marko...
read it
Emma Brunskill
is this you? claim profile
Reinforcement Learning, Interactive Machine Learning, ML/AI for Education at Stanford University, Assistant Professor, Computer Science at Stanford University, Assistant Professor, Computer Science at Carnegie Mellon University from 20112017, NSF Mathematical Science Postdoctoral Fellow, Computer Science Dept. at University of California Berekley from 20092011