
Thompson Sampling with a Mixture Prior
We study Thompson sampling (TS) in online decisionmaking problems where...
read it

Parameter and Feature Selection in Stochastic Linear Bandits
We study two model selection settings in stochastic linear bandits (LB)....
read it

FixedBudget BestArm Identification in Contextual Bandits: A StaticAdaptive Algorithm
We study the problem of bestarm identification (BAI) in contextual band...
read it

Adaptive Sampling for Minimax Fair Classification
Machine learning models trained on imbalanced datasets can often end up ...
read it

NonStationary Latent Bandits
Users of recommender systems often behave in a nonstationary fashion, d...
read it

SoftRobust Algorithms for Handling Model Misspecification
In reinforcement learning, robust policies for highstakes decisionmaki...
read it

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Uncertainty quantification (UQ) plays a pivotal role in reduction of unc...
read it

VarianceReduced OffPolicy MemoryEfficient Policy Search
Offpolicy policy optimization is a challenging problem in reinforcement...
read it

Deep Bayesian Quadrature Policy Optimization
We study the problem of obtaining accurate policy gradient estimates. Th...
read it

ControlAware Representations for Modelbased Reinforcement Learning
A major challenge in modern reinforcement learning (RL) is efficient con...
read it

Stochastic Bandits with Linear Constraints
We study a constrained contextual linear bandit setting, where the goal ...
read it

Variational Modelbased Policy Optimization
Modelbased reinforcement learning (RL) algorithms allow us to combine m...
read it

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
In this paper, we introduce proximal gradient temporal difference learni...
read it

Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems
Learning controllers merely based on a performance metric has been prove...
read it

Mirror Descent Policy Optimization
We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...
read it

Active Model Estimation in Markov Decision Processes
We study the problem of efficient exploration in order to learn an accur...
read it

Predictive Coding for LocallyLinear Control
Highdimensional observations and unknown dynamics are major challenges ...
read it

PolicyAware Model Learning for Policy Gradient Methods
This paper considers the problem of learning a model in modelbased rein...
read it

Improved Algorithms for Conservative Exploration in Bandits
In many fields such as digital marketing, healthcare, finance, and robot...
read it

Conservative Exploration in Reinforcement Learning
While learning in an unknown Markov Decision Process (MDP), an agent sho...
read it

Adaptive Sampling for Estimating Multiple Probability Distributions
We consider the problem of allocating samples to a finite set of discret...
read it

Multistep Greedy Policies in ModelFree Deep Reinforcement Learning
Multistep greedy policies have been extensively used in modelbased Rei...
read it

Benchmarking Batch Deep Reinforcement Learning Algorithms
Widelyused deep reinforcement learning algorithms have been shown to fa...
read it

MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
read it

Prediction, Consistency, Curvature: Representation Learning for LocallyLinear Control
Many realworld sequential decisionmaking problems can be formulated as...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

Active Learning for Binary Classification with Abstention
We construct and analyze active learning algorithms for the problem of b...
read it

Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
read it

Binary Classification with Bounded Abstention Rate
We consider the problem of binary classification with abstention in the ...
read it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it

Lyapunovbased Safe Policy Optimization for Continuous Control
We study continuous action reinforcement learning problems in which it i...
read it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
read it

A Block Coordinate Ascent Algorithm for MeanVariance Optimization
Risk management in dynamic decision problems is a primary concern in man...
read it

RiskSensitive Generative Adversarial Imitation Learning
We study risksensitive imitation learning where the agent's goal is to ...
read it

A Lyapunovbased Approach to Safe Reinforcement Learning
In many realworld reinforcement learning (RL) problems, besides optimiz...
read it

Optimizing over a Restricted Policy Class in Markov Decision Processes
We address the problem of finding an optimal policy in a Markov decision...
read it

Path Consistency Learning in Tsallis Entropy Regularized MDPs
We study the sparse entropyregularized reinforcement learning (ERL) pro...
read it

More Robust Doubly Robust Offpolicy Evaluation
We study the problem of offpolicy evaluation (OPE) in reinforcement lea...
read it

Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
read it

Active Learning for Accurate Estimation of Linear Models
We explore the sequential decision making problem where the goal is to e...
read it

Conservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicabi...
read it

Bayesian Reinforcement Learning: A Survey
Bayesian methods for machine learning have been widely investigated, yie...
read it

Safe Policy Improvement by Minimizing Robust Baseline Regret
An important problem in sequential decisionmaking under uncertainty is ...
read it

Graphical Model Sketch
Structured highcardinality data arises in many domains, and poses a maj...
read it

RiskConstrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decisionmaking problems one is interested in minimiz...
read it

Policy Gradient for Coherent Risk Measures
Several authors have recently developed risksensitive policy gradient m...
read it

Classificationbased Approximate Policy Iteration: Experiments and Extended Discussions
Tackling large approximate dynamic programming or reinforcement learning...
read it

Algorithms for CVaR Optimization in MDPs
In many sequential decisionmaking problems we may want to manage risk b...
read it

VarianceConstrained ActorCritic Algorithms for Discounted and Average Reward MDPs
In many sequential decisionmaking problems we may want to manage risk b...
read it
Mohammad Ghavamzadeh
is this you? claim profile
Senior Research Scientist at Google DeepMind Mountain View (on leave from INRIA)