
NonStationary Latent Bandits
Users of recommender systems often behave in a nonstationary fashion, d...
CoinDICE: OffPolicy Confidence Interval Estimation
We study highconfidence behavioragnostic offpolicy evaluation in rein...
Safe Reinforcement Learning with Natural Language Constraints
In this paper, we tackle the problem of learning control policies for ta...
ControlAware Representations for Modelbased Reinforcement Learning
A major challenge in modern reinforcement learning (RL) is efficient con...
Latent Bandits Revisited
A latent bandit problem is one in which the learning agent knows the arm...
PiecewiseStationary OffPolicy Optimization
Offpolicy learning is a framework for evaluating and optimizing policie...
Variational Modelbased Policy Optimization
Modelbased reinforcement learning (RL) algorithms allow us to combine m...
Predictive Coding for LocallyLinear Control
Highdimensional observations and unknown dynamics are major challenges ...
BRPO: Batch Residual Policy Optimization
In batch reinforcement learning (RL), one often constrains a learned pol...
AlgaeDICE: Policy Gradient from Arbitrary Experience
In many realworld applications of reinforcement learning (RL), interact...
CAQL: Continuous Action QLearning
Valuebased reinforcement learning (RL) methods like Qlearning have sho...
Prediction, Consistency, Curvature: Representation Learning for LocallyLinear Control
Many realworld sequential decisionmaking problems can be formulated as...
DualDICE: BehaviorAgnostic Estimation of Discounted Stationary Distribution Corrections
In many realworld reinforcement learning applications, access to the en...
Lyapunovbased Safe Policy Optimization for Continuous Control
We study continuous action reinforcement learning problems in which it i...
A Block Coordinate Ascent Algorithm for MeanVariance Optimization
Risk management in dynamic decision problems is a primary concern in man...
RiskSensitive Generative Adversarial Imitation Learning
We study risksensitive imitation learning where the agent's goal is to ...
A Lyapunovbased Approach to Safe Reinforcement Learning
In many realworld reinforcement learning (RL) problems, besides optimiz...
Path Consistency Learning in Tsallis Entropy Regularized MDPs
We study the sparse entropyregularized reinforcement learning (ERL) pro...
More Robust Doubly Robust Offpolicy Evaluation
We study the problem of offpolicy evaluation (OPE) in reinforcement lea...
Safe Policy Improvement by Minimizing Robust Baseline Regret
An important problem in sequential decisionmaking under uncertainty is ...
RiskConstrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decisionmaking problems one is interested in minimiz...
Two Phase Qlearning for Biddingbased Vehicle Sharing
We consider oneway vehicle sharing systems where customers can rent a c...
RiskSensitive and Robust DecisionMaking: a CVaR Optimization Approach
In this paper we address the problem of decision making within a Markov ...
Policy Gradient for Coherent Risk Measures
Several authors have recently developed risksensitive policy gradient m...
Algorithms for CVaR Optimization in MDPs
In many sequential decisionmaking problems we may want to manage risk b...
