
Deep Bayesian Quadrature Policy Optimization
We study the problem of obtaining accurate policy gradient estimates. Th...
read it

ControlAware Representations for Modelbased Reinforcement Learning
A major challenge in modern reinforcement learning (RL) is efficient con...
read it

Stochastic Bandits with Linear Constraints
We study a constrained contextual linear bandit setting, where the goal ...
read it

Variational Modelbased Policy Optimization
Modelbased reinforcement learning (RL) algorithms allow us to combine m...
read it

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
In this paper, we introduce proximal gradient temporal difference learni...
read it

Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems
Learning controllers merely based on a performance metric has been prove...
read it

Mirror Descent Policy Optimization
We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...
read it

Active Model Estimation in Markov Decision Processes
We study the problem of efficient exploration in order to learn an accur...
read it

Predictive Coding for LocallyLinear Control
Highdimensional observations and unknown dynamics are major challenges ...
read it

PolicyAware Model Learning for Policy Gradient Methods
This paper considers the problem of learning a model in modelbased rein...
read it

Improved Algorithms for Conservative Exploration in Bandits
In many fields such as digital marketing, healthcare, finance, and robot...
read it

Conservative Exploration in Reinforcement Learning
While learning in an unknown Markov Decision Process (MDP), an agent sho...
read it

Adaptive Sampling for Estimating Multiple Probability Distributions
We consider the problem of allocating samples to a finite set of discret...
read it

Multistep Greedy Policies in ModelFree Deep Reinforcement Learning
Multistep greedy policies have been extensively used in modelbased Rei...
read it

Benchmarking Batch Deep Reinforcement Learning Algorithms
Widelyused deep reinforcement learning algorithms have been shown to fa...
read it

MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
read it

Prediction, Consistency, Curvature: Representation Learning for LocallyLinear Control
Many realworld sequential decisionmaking problems can be formulated as...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

Active Learning for Binary Classification with Abstention
We construct and analyze active learning algorithms for the problem of b...
read it

Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
read it

Binary Classification with Bounded Abstention Rate
We consider the problem of binary classification with abstention in the ...
read it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it

Lyapunovbased Safe Policy Optimization for Continuous Control
We study continuous action reinforcement learning problems in which it i...
read it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
read it

A Block Coordinate Ascent Algorithm for MeanVariance Optimization
Risk management in dynamic decision problems is a primary concern in man...
read it

RiskSensitive Generative Adversarial Imitation Learning
We study risksensitive imitation learning where the agent's goal is to ...
read it

A Lyapunovbased Approach to Safe Reinforcement Learning
In many realworld reinforcement learning (RL) problems, besides optimiz...
read it

Optimizing over a Restricted Policy Class in Markov Decision Processes
We address the problem of finding an optimal policy in a Markov decision...
read it

Path Consistency Learning in Tsallis Entropy Regularized MDPs
We study the sparse entropyregularized reinforcement learning (ERL) pro...
read it

More Robust Doubly Robust Offpolicy Evaluation
We study the problem of offpolicy evaluation (OPE) in reinforcement lea...
read it

Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
read it

Active Learning for Accurate Estimation of Linear Models
We explore the sequential decision making problem where the goal is to e...
read it

Conservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicabi...
read it

Bayesian Reinforcement Learning: A Survey
Bayesian methods for machine learning have been widely investigated, yie...
read it

Safe Policy Improvement by Minimizing Robust Baseline Regret
An important problem in sequential decisionmaking under uncertainty is ...
read it

Graphical Model Sketch
Structured highcardinality data arises in many domains, and poses a maj...
read it

RiskConstrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decisionmaking problems one is interested in minimiz...
read it

Policy Gradient for Coherent Risk Measures
Several authors have recently developed risksensitive policy gradient m...
read it

Classificationbased Approximate Policy Iteration: Experiments and Extended Discussions
Tackling large approximate dynamic programming or reinforcement learning...
read it

Algorithms for CVaR Optimization in MDPs
In many sequential decisionmaking problems we may want to manage risk b...
read it

VarianceConstrained ActorCritic Algorithms for Discounted and Average Reward MDPs
In many sequential decisionmaking problems we may want to manage risk b...
read it

A Dantzig Selector Approach to Temporal Difference Learning
LSTD is a popular algorithm for value function approximation. Whenever t...
read it

Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm ...
read it

A Generalized Kernel Approach to Structured Output Learning
We study the problem of structured output learning from a regression per...
read it
Mohammad Ghavamzadeh
is this you? claim profile
Senior Research Scientist at Google DeepMind Mountain View (on leave from INRIA)