Mohammad Ghavamzadeh

research

∙ 06/02/2023

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

Algorithms for offline bandits must optimize decisions in uncertain envi...

0 Mohammad Ghavamzadeh, et al. ∙

research

∙ 05/25/2023

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Learning from human feedback has been shown to improve text-to-image mod...

0 Ying Fan, et al. ∙

research

∙ 05/12/2023

Private and Communication-Efficient Algorithms for Entropy Estimation

Modern statistical estimation is often performed in a distributed settin...

0 Gecia Bravo Hermsdorff, et al. ∙

research

∙ 04/24/2023

On Dynamic Program Decompositions of Static Risk Measures

Optimizing static risk-averse objectives in Markov decision processes is...

0 Jia Lin Hau, et al. ∙

research

∙ 04/22/2023

A Review of Deep Learning for Video Captioning

Video captioning (VC) is a fast-moving, cross-disciplinary area of resea...

0 Moloud Abdar, et al. ∙

research

∙ 02/23/2023

Aligning Text-to-Image Models using Human Feedback

Deep generative models have shown impressive results in text-to-image sy...

1 Kimin Lee, et al. ∙

research

∙ 02/21/2023

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Reinforcement learning (RL) has shown great promise for developing dialo...

0 Dhawal Gupta, et al. ∙

research

∙ 12/09/2022

Multi-Task Off-Policy Learning from Bandit Feedback

Many practical applications, such as recommender systems and learning to...

0 Joey Hong, et al. ∙

research

∙ 11/25/2022

Operator Splitting Value Iteration

We introduce new planning and reinforcement learning algorithms for disc...

0 Amin Rakhsha, et al. ∙

research

∙ 09/09/2022

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion...

0 Jia Lin Hau, et al. ∙

research

∙ 08/10/2022

Robust Reinforcement Learning using Offline Data

The goal of robust reinforcement learning (RL) is to learn a policy that...

0 Kishan Panaganti, et al. ∙

research

∙ 07/01/2022

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

Learning task-oriented dialog policies via reinforcement learning typica...

5 Jorge A. Mendez, et al. ∙

research

∙ 05/31/2022

A Mixture-of-Expert Approach to RL-based Dialogue Management

Despite recent advancements in language models (LMs), their application ...

0 Yinlam Chow, et al. ∙

research

∙ 05/12/2022

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, w...

0 Ahmadreza Moradipari, et al. ∙

research

∙ 05/12/2022

Multi-Environment Meta-Learning in Stochastic Linear Bandits

In this work we investigate meta-learning (or learning-to-learn) approac...

0 Ahmadreza Moradipari, et al. ∙

research

∙ 05/10/2022

Efficient Risk-Averse Reinforcement Learning

In risk-averse reinforcement learning (RL), the goal is to optimize some...

9 Ido Greenberg, et al. ∙

research

∙ 02/25/2022

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

We study a sequential decision problem where the learner faces a sequenc...

0 MohammadJavad Azizi, et al. ∙

research

∙ 02/25/2022

Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in b...

0 MohammadJavad Azizi, et al. ∙

research

∙ 02/03/2022

Deep Hierarchy in Bandits

Mean rewards of actions are often correlated. The form of these correlat...

0 Joey Hong, et al. ∙

research

∙ 11/12/2021

Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving s...

0 Joey Hong, et al. ∙

research

∙ 06/10/2021

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision-making problems where...

0 Joey Hong, et al. ∙

research

∙ 06/09/2021

Parameter and Feature Selection in Stochastic Linear Bandits

We study two model selection settings in stochastic linear bandits (LB)....

0 Ahmadreza Moradipari, et al. ∙

research

∙ 06/09/2021

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

We study the problem of best-arm identification (BAI) in contextual band...

0 MohammadJavad Azizi, et al. ∙

research

∙ 03/01/2021

Adaptive Sampling for Minimax Fair Classification

Machine learning models trained on imbalanced datasets can often end up ...

0 Shubhanshu Shekhar, et al. ∙

research

∙ 12/01/2020

Non-Stationary Latent Bandits

Users of recommender systems often behave in a non-stationary fashion, d...

0 Joey Hong, et al. ∙

research

∙ 11/30/2020

Soft-Robust Algorithms for Handling Model Misspecification

In reinforcement learning, robust policies for high-stakes decision-maki...

0 Elita A. Lobo, et al. ∙

research

∙ 11/12/2020

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

Uncertainty quantification (UQ) plays a pivotal role in reduction of unc...

60 Moloud Abdar, et al. ∙

research

∙ 09/14/2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement...

0 Daoming Lyu, et al. ∙

research

∙ 06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...

6 Akella Ravi Tej, et al. ∙

research

∙ 06/24/2020

Control-Aware Representations for Model-based Reinforcement Learning

A major challenge in modern reinforcement learning (RL) is efficient con...

0 Brandon Cui, et al. ∙

research

∙ 06/17/2020

Stochastic Bandits with Linear Constraints

We study a constrained contextual linear bandit setting, where the goal ...

0 Aldo Pacchiano, et al. ∙

research

∙ 06/09/2020

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine m...

0 Yinlam Chow, et al. ∙

research

∙ 06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...

0 Bo Liu, et al. ∙

research

∙ 06/06/2020

Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems

Learning controllers merely based on a performance metric has been prove...

19 Arash Mehrjou, et al. ∙

research

∙ 05/20/2020

Mirror Descent Policy Optimization

We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...

0 Manan Tomar, et al. ∙

research

∙ 03/06/2020

Active Model Estimation in Markov Decision Processes

We study the problem of efficient exploration in order to learn an accur...

25 Jean Tarbouriech, et al. ∙

research

∙ 03/02/2020

Predictive Coding for Locally-Linear Control

High-dimensional observations and unknown dynamics are major challenges ...

5 Rui Shu, et al. ∙

research

∙ 02/28/2020

Policy-Aware Model Learning for Policy Gradient Methods

This paper considers the problem of learning a model in model-based rein...

0 Romina Abachi, et al. ∙

research

∙ 02/08/2020

Improved Algorithms for Conservative Exploration in Bandits

In many fields such as digital marketing, healthcare, finance, and robot...

0 Evrard Garcelon, et al. ∙

research

∙ 02/08/2020

Conservative Exploration in Reinforcement Learning

While learning in an unknown Markov Decision Process (MDP), an agent sho...

0 Evrard Garcelon, et al. ∙

research

∙ 10/28/2019

Adaptive Sampling for Estimating Multiple Probability Distributions

We consider the problem of allocating samples to a finite set of discret...

0 Shubhanshu Shekhar, et al. ∙

research

∙ 10/07/2019

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Multi-step greedy policies have been extensively used in model-based Rei...

0 Manan Tomar, et al. ∙

research

∙ 10/03/2019

Benchmarking Batch Deep Reinforcement Learning Algorithms

Widely-used deep reinforcement learning algorithms have been shown to fa...

0 Scott Fujimoto, et al. ∙

research

∙ 09/10/2019

Multi-Step Greedy and Approximate Real Time Dynamic Programming

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming...

0 Yonathan Efroni, et al. ∙

research

∙ 09/04/2019

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

Many real-world sequential decision-making problems can be formulated as...

25 Nir Levine, et al. ∙

research

∙ 06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...

2 Branislav Kveton, et al. ∙

research

∙ 06/01/2019

Active Learning for Binary Classification with Abstention

We construct and analyze active learning algorithms for the problem of b...

0 Shubhanshu Shekhar, et al. ∙

research

∙ 05/27/2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

State-of-the-art efficient model-based Reinforcement Learning (RL) algor...

0 Yonathan Efroni, et al. ∙

research

∙ 05/23/2019

Binary Classification with Bounded Abstention Rate

We consider the problem of binary classification with abstention in the ...

0 Shubhanshu Shekhar, et al. ∙

research

∙ 03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...

10 Branislav Kveton, et al. ∙

Mohammad Ghavamzadeh

Featured Co-authors

Sign in with Google

Consider DeepAI Pro