
Deep Bayesian Quadrature Policy Optimization
We study the problem of obtaining accurate policy gradient estimates. Th...
ControlAware Representations for Modelbased Reinforcement Learning
A major challenge in modern reinforcement learning (RL) is efficient con...
Stochastic Bandits with Linear Constraints
We study a constrained contextual linear bandit setting, where the goal ...
Variational Modelbased Policy Optimization
Modelbased reinforcement learning (RL) algorithms allow us to combine m...
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
In this paper, we introduce proximal gradient temporal difference learni...
Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems
Learning controllers merely based on a performance metric has been prove...
Mirror Descent Policy Optimization
We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...
Active Model Estimation in Markov Decision Processes
We study the problem of efficient exploration in order to learn an accur...
Predictive Coding for LocallyLinear Control
Highdimensional observations and unknown dynamics are major challenges ...
PolicyAware Model Learning for Policy Gradient Methods
This paper considers the problem of learning a model in modelbased rein...
Improved Algorithms for Conservative Exploration in Bandits
In many fields such as digital marketing, healthcare, finance, and robot...
Conservative Exploration in Reinforcement Learning
While learning in an unknown Markov Decision Process (MDP), an agent sho...
Adaptive Sampling for Estimating Multiple Probability Distributions
We consider the problem of allocating samples to a finite set of discret...
Multistep Greedy Policies in ModelFree Deep Reinforcement Learning
Multistep greedy policies have been extensively used in modelbased Rei...
Benchmarking Batch Deep Reinforcement Learning Algorithms
Widelyused deep reinforcement learning algorithms have been shown to fa...
MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
Prediction, Consistency, Curvature: Representation Learning for LocallyLinear Control
Many realworld sequential decisionmaking problems can be formulated as...
Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
Active Learning for Binary Classification with Abstention
We construct and analyze active learning algorithms for the problem of b...
Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
Binary Classification with Bounded Abstention Rate
We consider the problem of binary classification with abstention in the ...
PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
Lyapunovbased Safe Policy Optimization for Continuous Control
We study continuous action reinforcement learning problems in which it i...
Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
A Block Coordinate Ascent Algorithm for MeanVariance Optimization
Risk management in dynamic decision problems is a primary concern in man...
RiskSensitive Generative Adversarial Imitation Learning
We study risksensitive imitation learning where the agent's goal is to ...
A Lyapunovbased Approach to Safe Reinforcement Learning
In many realworld reinforcement learning (RL) problems, besides optimiz...
Optimizing over a Restricted Policy Class in Markov Decision Processes
We address the problem of finding an optimal policy in a Markov decision...
Path Consistency Learning in Tsallis Entropy Regularized MDPs
We study the sparse entropyregularized reinforcement learning (ERL) pro...
More Robust Doubly Robust Offpolicy Evaluation
We study the problem of offpolicy evaluation (OPE) in reinforcement lea...
Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
Active Learning for Accurate Estimation of Linear Models
We explore the sequential decision making problem where the goal is to e...
Conservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicabi...
Bayesian Reinforcement Learning: A Survey
Bayesian methods for machine learning have been widely investigated, yie...
Safe Policy Improvement by Minimizing Robust Baseline Regret
An important problem in sequential decisionmaking under uncertainty is ...
Graphical Model Sketch
Structured highcardinality data arises in many domains, and poses a maj...
RiskConstrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decisionmaking problems one is interested in minimiz...
Policy Gradient for Coherent Risk Measures
Several authors have recently developed risksensitive policy gradient m...
Classificationbased Approximate Policy Iteration: Experiments and Extended Discussions
Tackling large approximate dynamic programming or reinforcement learning...
Algorithms for CVaR Optimization in MDPs
In many sequential decisionmaking problems we may want to manage risk b...
VarianceConstrained ActorCritic Algorithms for Discounted and Average Reward MDPs
In many sequential decisionmaking problems we may want to manage risk b...
A Dantzig Selector Approach to Temporal Difference Learning
LSTD is a popular algorithm for value function approximation. Whenever t...
Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm ...
A Generalized Kernel Approach to Structured Output Learning
We study the problem of structured output learning from a regression per...
