Yasin Abbasi-Yadkori

research

∙ 06/22/2023

Context-lumpable stochastic bandits

We consider a contextual bandit problem with S contexts and A actions. I...

0 Chung-Wei Lee, et al. ∙

research

∙ 02/25/2022

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

We study a sequential decision problem where the learner faces a sequenc...

0 MohammadJavad Azizi, et al. ∙

research

∙ 01/17/2022

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

We study the non-stationary stochastic multi-armed bandit problem, where...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 08/12/2021

Efficient Local Planning with Linear Function Approximation

We study query and computationally efficient planning algorithms with li...

0 Dong Yin, et al. ∙

research

∙ 06/09/2021

Parameter and Feature Selection in Stochastic Linear Bandits

We study two model selection settings in stochastic linear bandits (LB)....

0 Ahmadreza Moradipari, et al. ∙

research

∙ 02/25/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undis...

0 Nevena Lazic, et al. ∙

research

∙ 02/11/2021

Optimization Issues in KL-Constrained Approximate Policy Iteration

Many reinforcement learning algorithms can be seen as versions of approx...

0 Nevena Lazic, et al. ∙

research

∙ 02/03/2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

We consider the problem of local planning in fixed-horizon Markov Decisi...

0 Gellért Weisz, et al. ∙

research

∙ 10/20/2020

The Elliptical Potential Lemma Revisited

This note proposes a new proof and new perspectives on the so-called Ell...

0 Alexandra Carpentier, et al. ∙

research

∙ 06/09/2020

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learn...

9 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 06/04/2020

Sample Efficient Graph-Based Optimization with Noisy Observations

We study sample complexity of optimizing "hill-climbing friendly" functi...

10 Tan Nguyen, et al. ∙

research

∙ 03/03/2020

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...

0 Aldo Pacchiano, et al. ∙

research

∙ 02/08/2020

Provably Efficient Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value functio...

15 Botao Hao, et al. ∙

research

∙ 08/27/2019

Exploration-Enhanced POLITEX

We study algorithms for average-cost reinforcement learning problems wit...

1 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 08/14/2019

Thompson Sampling and Approximate Inference

We study the effects of approximate inference on the performance of Thom...

0 My Phan, et al. ∙

research

∙ 06/12/2019

Bootstrapping Upper Confidence Bound

Upper Confidence Bound (UCB) method is arguably the most celebrated one ...

0 Botao Hao, et al. ∙

research

∙ 01/06/2019

Large-Scale Markov Decision Problems via the Linear Programming Dual

We consider the problem of controlling a fully specified Markov decision...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 05/24/2018

New Insights into Bootstrapping for Bandits

We investigate the use of bootstrapping in the bandit setting. We first ...

0 Sharan Vaswani, et al. ∙

research

∙ 05/04/2018

Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting

We study the problem of sampling from a distribution where the negative ...

0 Xiang Cheng, et al. ∙

research

∙ 04/27/2018

Offline Evaluation of Ranking Policies with Click Models

Many web systems rank and present a list of items to users, from recomme...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 04/17/2018

Regret Bounds for Model-Free Linear Quadratic Control

Model-free approaches for reinforcement learning (RL) and continuous con...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 02/26/2018

Optimizing over a Restricted Policy Class in Markov Decision Processes

We address the problem of finding an optimal policy in a Markov decision...

0 Ershad Banijamali, et al. ∙

research

∙ 02/10/2018

A Continuation Method for Discrete Optimization and its Application to Nearest Neighbor Classification

The continuation method is a popular approach in non-convex optimization...

0 Ali Shameli, et al. ∙

research

∙ 12/13/2017

Stochastic Low-Rank Bandits

Many problems in computer vision and recommender systems involve low-ran...

0 Branislav Kveton, et al. ∙

research

∙ 11/21/2017

Posterior Sampling for Large Scale Reinforcement Learning

Posterior sampling for reinforcement learning (PSRL) is a popular algori...

0 Georgios Theocharous, et al. ∙

research

∙ 11/19/2016

Conservative Contextual Linear Bandits

Safety is a desirable property that can immensely increase the applicabi...

0 Abbas Kazerouni, et al. ∙

research

∙ 10/19/2016

Hit-and-Run for Sampling and Planning in Non-Convex Spaces

We propose the Hit-and-Run algorithm for planning and sampling problems ...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 06/26/2014

Online learning in MDPs with side information

We study online learning of finite Markov decision process (MDP) problem...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 02/27/2014

Linear Programming for Large-Scale Markov Decision Problems

We consider the problem of controlling a Markov decision process (MDP) w...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 03/12/2013

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

We study the problem of learning Markov decision processes with finite s...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 05/09/2012

Improved Mean and Variance Approximations for Belief Net Responses via Network Doubling

A Bayesian belief network models a joint distribution with an directed a...

0 Peter Hooper, et al. ∙

research

∙ 02/14/2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many ...

0 Yasin Abbasi-Yadkori, et al. ∙

Yasin Abbasi-Yadkori

Featured Co-authors

Sign in with Google

Consider DeepAI Pro