
NonStationary Latent Bandits
Users of recommender systems often behave in a nonstationary fashion, d...
read it

CoinDICE: OffPolicy Confidence Interval Estimation
We study highconfidence behavioragnostic offpolicy evaluation in rein...
read it

Safe Reinforcement Learning with Natural Language Constraints
In this paper, we tackle the problem of learning control policies for ta...
read it

ControlAware Representations for Modelbased Reinforcement Learning
A major challenge in modern reinforcement learning (RL) is efficient con...
read it

Latent Bandits Revisited
A latent bandit problem is one in which the learning agent knows the arm...
read it

PiecewiseStationary OffPolicy Optimization
Offpolicy learning is a framework for evaluating and optimizing policie...
read it

Variational Modelbased Policy Optimization
Modelbased reinforcement learning (RL) algorithms allow us to combine m...
read it

Predictive Coding for LocallyLinear Control
Highdimensional observations and unknown dynamics are major challenges ...
read it

BRPO: Batch Residual Policy Optimization
In batch reinforcement learning (RL), one often constrains a learned pol...
read it

AlgaeDICE: Policy Gradient from Arbitrary Experience
In many realworld applications of reinforcement learning (RL), interact...
read it

CAQL: Continuous Action QLearning
Valuebased reinforcement learning (RL) methods like Qlearning have sho...
read it

Prediction, Consistency, Curvature: Representation Learning for LocallyLinear Control
Many realworld sequential decisionmaking problems can be formulated as...
read it

DualDICE: BehaviorAgnostic Estimation of Discounted Stationary Distribution Corrections
In many realworld reinforcement learning applications, access to the en...
read it

Lyapunovbased Safe Policy Optimization for Continuous Control
We study continuous action reinforcement learning problems in which it i...
read it

A Block Coordinate Ascent Algorithm for MeanVariance Optimization
Risk management in dynamic decision problems is a primary concern in man...
read it

RiskSensitive Generative Adversarial Imitation Learning
We study risksensitive imitation learning where the agent's goal is to ...
read it

A Lyapunovbased Approach to Safe Reinforcement Learning
In many realworld reinforcement learning (RL) problems, besides optimiz...
read it

Path Consistency Learning in Tsallis Entropy Regularized MDPs
We study the sparse entropyregularized reinforcement learning (ERL) pro...
read it

More Robust Doubly Robust Offpolicy Evaluation
We study the problem of offpolicy evaluation (OPE) in reinforcement lea...
read it

Safe Policy Improvement by Minimizing Robust Baseline Regret
An important problem in sequential decisionmaking under uncertainty is ...
read it

RiskConstrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decisionmaking problems one is interested in minimiz...
read it

Two Phase Qlearning for Biddingbased Vehicle Sharing
We consider oneway vehicle sharing systems where customers can rent a c...
read it

RiskSensitive and Robust DecisionMaking: a CVaR Optimization Approach
In this paper we address the problem of decision making within a Markov ...
read it

Policy Gradient for Coherent Risk Measures
Several authors have recently developed risksensitive policy gradient m...
read it

Algorithms for CVaR Optimization in MDPs
In many sequential decisionmaking problems we may want to manage risk b...
read it