
Towards Practical Mean Bounds for Small Samples
Historically, to bound the mean for small sample sizes, practitioners ha...
read it

MultiObjective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
We study the problem of Safe Policy Improvement (SPI) under constraints ...
read it

Universal OffPolicy Evaluation
When faced with sequential decisionmaking problems, it is often useful ...
read it

HighConfidence OffPolicy (or Counterfactual) Variance Estimation
Many sequential decisionmaking systems leverage data collected using pr...
read it

Towards Safe Policy Improvement for NonStationary MDPs
Many realworld sequential decisionmaking problems involve critical sys...
read it

Reinforcement Learning for Strategic Recommendations
Strategic recommendations (SR) refer to the problem where an intelligent...
read it

Evaluating the Performance of Reinforcement Learning Algorithms
Performance evaluations are critical for quantifying algorithmic advance...
read it

Optimizing for the Future in NonStationary MDPs
Most reinforcement learning methods are based upon the key assumption th...
read it

Learning Reusable Options for MultiTask Reinforcement Learning
Reinforcement learning (RL) has become an increasingly active area of re...
read it

Reinforcement learning with spiking coagents
Neuroscientific theory suggests that dopaminergic neurons broadcast glob...
read it

Is the Policy Gradient a Gradient?
The policy gradient theorem describes the gradient of the expected disco...
read it

Classical Policy Gradient: Preserving Bellman's Principle of Optimality
We propose a new objective function for finitehorizon episodic Markov d...
read it

Reinforcement Learning When All Actions are Not Always Available
The Markov decision process (MDP) formulation used to model many realwo...
read it

Lifelong Learning with a Changing Action Set
In many realworld sequential decision making problems, the number of av...
read it

A New Confidence Interval for the Mean of a Bounded Random Variable
We present a new method for constructing a confidence interval for the m...
read it

Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
In this paper we introduce a reinforcement learning (RL) approach for tr...
read it

Reinforcement Learning Without Backpropagation or a Clock
In this paper we introduce a reinforcement learning (RL) approach for tr...
read it

A MetaMDP Approach to Exploration for Lifelong Reinforcement Learning
In this paper we consider the problem of how a reinforcement learning ag...
read it

Learning Action Representations for Reinforcement Learning
Most modelfree reinforcement learning methods leverage state representa...
read it

Privacy Preserving OffPolicy Evaluation
Many reinforcement learning applications involve the use of data that is...
read it

Natural Option Critic
The recently proposed optioncritic architecture Bacon et al. provide a ...
read it

On Ensuring that Intelligent Machines Are WellBehaved
Machine learning algorithms are everywhere, ranging from simple data ana...
read it

Policy Gradient Methods for Reinforcement Learning with Function Approximation and ActionDependent Baselines
We show how an actiondependent baseline can be used by the policy gradi...
read it

DataEfficient Policy Evaluation Through Behavior Policy Search
We consider the task of evaluating a policy for a Markov decision proces...
read it

Decoupling Learning Rules from Representations
In the artificial intelligence field, learning often corresponds to chan...
read it

Importance Sampling with Unequal Support
Importance sampling is often used in machine learning when training and ...
read it

DataEfficient OffPolicy Policy Evaluation for Reinforcement Learning
In this paper we present a new way of predicting the performance of a re...
read it

A Notation for Markov Decision Processes
This paper specifies a notation for Markov decision processes....
read it

Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
read it
Philip S. Thomas
is this you? claim profile