
Towards Practical Mean Bounds for Small Samples
Historically, to bound the mean for small sample sizes, practitioners ha...
MultiObjective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
We study the problem of Safe Policy Improvement (SPI) under constraints ...
Universal OffPolicy Evaluation
When faced with sequential decisionmaking problems, it is often useful ...
HighConfidence OffPolicy (or Counterfactual) Variance Estimation
Many sequential decisionmaking systems leverage data collected using pr...
Towards Safe Policy Improvement for NonStationary MDPs
Many realworld sequential decisionmaking problems involve critical sys...
Reinforcement Learning for Strategic Recommendations
Strategic recommendations (SR) refer to the problem where an intelligent...
Evaluating the Performance of Reinforcement Learning Algorithms
Performance evaluations are critical for quantifying algorithmic advance...
Optimizing for the Future in NonStationary MDPs
Most reinforcement learning methods are based upon the key assumption th...
Learning Reusable Options for MultiTask Reinforcement Learning
Reinforcement learning (RL) has become an increasingly active area of re...
Reinforcement learning with spiking coagents
Neuroscientific theory suggests that dopaminergic neurons broadcast glob...
Is the Policy Gradient a Gradient?
The policy gradient theorem describes the gradient of the expected disco...
Classical Policy Gradient: Preserving Bellman's Principle of Optimality
We propose a new objective function for finitehorizon episodic Markov d...
Reinforcement Learning When All Actions are Not Always Available
The Markov decision process (MDP) formulation used to model many realwo...
Lifelong Learning with a Changing Action Set
In many realworld sequential decision making problems, the number of av...
A New Confidence Interval for the Mean of a Bounded Random Variable
We present a new method for constructing a confidence interval for the m...
Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
In this paper we introduce a reinforcement learning (RL) approach for tr...
Reinforcement Learning Without Backpropagation or a Clock
In this paper we introduce a reinforcement learning (RL) approach for tr...
A MetaMDP Approach to Exploration for Lifelong Reinforcement Learning
In this paper we consider the problem of how a reinforcement learning ag...
Learning Action Representations for Reinforcement Learning
Most modelfree reinforcement learning methods leverage state representa...
Privacy Preserving OffPolicy Evaluation
Many reinforcement learning applications involve the use of data that is...
Natural Option Critic
The recently proposed optioncritic architecture Bacon et al. provide a ...
On Ensuring that Intelligent Machines Are WellBehaved
Machine learning algorithms are everywhere, ranging from simple data ana...
Policy Gradient Methods for Reinforcement Learning with Function Approximation and ActionDependent Baselines
We show how an actiondependent baseline can be used by the policy gradi...
DataEfficient Policy Evaluation Through Behavior Policy Search
We consider the task of evaluating a policy for a Markov decision proces...
Decoupling Learning Rules from Representations
In the artificial intelligence field, learning often corresponds to chan...
Importance Sampling with Unequal Support
Importance sampling is often used in machine learning when training and ...
DataEfficient OffPolicy Policy Evaluation for Reinforcement Learning
In this paper we present a new way of predicting the performance of a re...
A Notation for Markov Decision Processes
This paper specifies a notation for Markov decision processes....
Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
Philip S. Thomas
