Philip S. Thomas

research

∙ 05/16/2023

Coagent Networks: Generalized and Scaled

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011...

0 James E. Kostas, et al. ∙

research

∙ 02/06/2023

Optimization using Parallel Gradient Evaluations on Multiple Parameters

We propose a first-order method for convex optimization, where instead o...

0 Yash Chandak, et al. ∙

research

∙ 01/24/2023

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Methods for sequential decision-making are often built upon a foundation...

6 Yash Chandak, et al. ∙

research

∙ 08/24/2022

Enforcing Delayed-Impact Fairness Guarantees

Recent research has shown that seemingly fair machine learning models, w...

0 Aline Weber, et al. ∙

research

∙ 06/06/2022

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

Model-based reinforcement learning promises to learn an optimal policy f...

2 Abhinav Bhatia, et al. ∙

research

∙ 12/10/2021

Edge-Compatible Reinforcement Learning for Recommendations

Most reinforcement learning (RL) recommendation systems designed for edg...

0 James E. Kostas, et al. ∙

research

∙ 11/06/2021

SOPE: Spectrum of Off-Policy Estimators

Many sequential decision making problems are high-stakes and require off...

0 Christina J. Yuan, et al. ∙

research

∙ 06/06/2021

Towards Practical Mean Bounds for Small Samples

Historically, to bound the mean for small sample sizes, practitioners ha...

0 My Phan, et al. ∙

research

∙ 05/31/2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

We study the problem of Safe Policy Improvement (SPI) under constraints ...

5 Harsh Satija, et al. ∙

research

∙ 04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...

0 Yash Chandak, et al. ∙

research

∙ 01/25/2021

High-Confidence Off-Policy (or Counterfactual) Variance Estimation

Many sequential decision-making systems leverage data collected using pr...

12 Yash Chandak, et al. ∙

research

∙ 10/23/2020

Towards Safe Policy Improvement for Non-Stationary MDPs

Many real-world sequential decision-making problems involve critical sys...

0 Yash Chandak, et al. ∙

research

∙ 09/15/2020

Reinforcement Learning for Strategic Recommendations

Strategic recommendations (SR) refer to the problem where an intelligent...

0 Georgios Theocharous, et al. ∙

research

∙ 06/30/2020

Evaluating the Performance of Reinforcement Learning Algorithms

Performance evaluations are critical for quantifying algorithmic advance...

0 Scott M. Jordan, et al. ∙

research

∙ 05/17/2020

Optimizing for the Future in Non-Stationary MDPs

Most reinforcement learning methods are based upon the key assumption th...

2 Yash Chandak, et al. ∙

research

∙ 01/06/2020

Learning Reusable Options for Multi-Task Reinforcement Learning

Reinforcement learning (RL) has become an increasingly active area of re...

27 Francisco M. Garcia, et al. ∙

research

∙ 10/15/2019

Reinforcement learning with spiking coagents

Neuroscientific theory suggests that dopaminergic neurons broadcast glob...

64 Sneha Aenugu, et al. ∙

research

∙ 06/17/2019

Is the Policy Gradient a Gradient?

The policy gradient theorem describes the gradient of the expected disco...

0 Chris Nota, et al. ∙

research

∙ 06/06/2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

We propose a new objective function for finite-horizon episodic Markov d...

0 Philip S. Thomas, et al. ∙

research

∙ 06/05/2019

Reinforcement Learning When All Actions are Not Always Available

The Markov decision process (MDP) formulation used to model many real-wo...

0 Yash Chandak, et al. ∙

research

∙ 06/05/2019

Lifelong Learning with a Changing Action Set

In many real-world sequential decision making problems, the number of av...

0 Yash Chandak, et al. ∙

research

∙ 05/15/2019

A New Confidence Interval for the Mean of a Bounded Random Variable

We present a new method for constructing a confidence interval for the m...

0 Erik Learned-Miller, et al. ∙

research

∙ 02/15/2019

Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for tr...

0 James Kostas, et al. ∙

research

∙ 02/15/2019

Reinforcement Learning Without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for tr...

0 James Kostas, et al. ∙

research

∙ 02/03/2019

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

In this paper we consider the problem of how a reinforcement learning ag...

12 Francisco M. Garcia, et al. ∙

research

∙ 02/01/2019

Learning Action Representations for Reinforcement Learning

Most model-free reinforcement learning methods leverage state representa...

0 Yash Chandak, et al. ∙

research

∙ 02/01/2019

Privacy Preserving Off-Policy Evaluation

Many reinforcement learning applications involve the use of data that is...

0 Tengyang Xie, et al. ∙

research

∙ 12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...

0 Saket Tiwari, et al. ∙

research

∙ 08/17/2017

On Ensuring that Intelligent Machines Are Well-Behaved

Machine learning algorithms are everywhere, ranging from simple data ana...

0 Philip S. Thomas, et al. ∙

research

∙ 06/20/2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

We show how an action-dependent baseline can be used by the policy gradi...

0 Philip S. Thomas, et al. ∙

research

∙ 06/12/2017

Data-Efficient Policy Evaluation Through Behavior Policy Search

We consider the task of evaluating a policy for a Markov decision proces...

0 Josiah P. Hanna, et al. ∙

research

∙ 06/09/2017

Decoupling Learning Rules from Representations

In the artificial intelligence field, learning often corresponds to chan...

0 Philip S. Thomas, et al. ∙

research

∙ 11/10/2016

Importance Sampling with Unequal Support

Importance sampling is often used in machine learning when training and ...

0 Philip S. Thomas, et al. ∙

research

∙ 04/04/2016

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

In this paper we present a new way of predicting the performance of a re...

0 Philip S. Thomas, et al. ∙

research

∙ 12/30/2015

A Notation for Markov Decision Processes

This paper specifies a notation for Markov decision processes....

0 Philip S. Thomas, et al. ∙

research

∙ 12/15/2015

Increasing the Action Gap: New Operators for Reinforcement Learning

This paper introduces new optimality-preserving operators on Q-functions...

0 Marc G. Bellemare, et al. ∙

Philip S. Thomas

Featured Co-authors

Sign in with Google

Consider DeepAI Pro