Craig Boutilier

research

∙ 09/08/2023

Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models

Modern recommender systems lie at the heart of complex ecosystems that c...

0 Craig Boutilier, et al. ∙

research

∙ 09/02/2023

Content Prompting: Modeling Content Provider Dynamics to Improve User Welfare in Recommender Ecosystems

Users derive value from a recommender system (RS) only to the extent tha...

0 Siddharth Prasad, et al. ∙

research

∙ 05/25/2023

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Learning from human feedback has been shown to improve text-to-image mod...

0 Ying Fan, et al. ∙

research

∙ 05/24/2023

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

While popularity bias is recognized to play a role in recommmender (and ...

0 Guy Tennenholtz, et al. ∙

research

∙ 02/23/2023

Aligning Text-to-Image Models using Human Feedback

Deep generative models have shown impressive results in text-to-image sy...

1 Kimin Lee, et al. ∙

research

∙ 02/21/2023

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Reinforcement learning (RL) has shown great promise for developing dialo...

0 Dhawal Gupta, et al. ∙

research

∙ 02/04/2023

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...

0 Guy Tennenholtz, et al. ∙

research

∙ 10/27/2022

Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

In September 2021, the "One Hundred Year Study on Artificial Intelligenc...

5 Michael L. Littman, et al. ∙

research

∙ 07/25/2022

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Despite recent advances in natural language understanding and generation...

0 Deborah Cohen, et al. ∙

research

∙ 07/20/2022

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Recommender systems are the algorithms which select, filter, and persona...

0 Jonathan Stray, et al. ∙

research

∙ 05/31/2022

A Mixture-of-Expert Approach to RL-based Dialogue Management

Despite recent advancements in language models (LMs), their application ...

0 Yinlam Chow, et al. ∙

research

∙ 02/06/2022

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

Interactive recommender systems (RSs) allow users to express intent, pre...

0 Christina Göpfert, et al. ∙

research

∙ 01/24/2022

IMO^3: Interactive Multi-Objective Off-Policy Optimization

Most real-world optimization problems have multiple objectives. A system...

0 Nan Wang, et al. ∙

research

∙ 06/10/2021

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision-making problems where...

0 Joey Hong, et al. ∙

research

∙ 03/14/2021

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

The development of recommender systems that optimize multi-turn interact...

0 Martin Mladenov, et al. ∙

research

∙ 02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...

0 Branislav Kveton, et al. ∙

research

∙ 12/01/2020

Non-Stationary Latent Bandits

Users of recommender systems often behave in a non-stationary fashion, d...

0 Joey Hong, et al. ∙

research

∙ 07/31/2020

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

Most recommender systems (RS) research assumes that a user's utility can...

4 Martin Mladenov, et al. ∙

research

∙ 06/15/2020

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...

0 Joey Hong, et al. ∙

research

∙ 06/09/2020

Differentiable Meta-Learning in Contextual Bandits

We study a contextual bandit setting where the learning agent has access...

0 Branislav Kveton, et al. ∙

research

∙ 02/27/2020

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Delusional bias is a fundamental source of error in approximate Q-learni...

20 Andy Su, et al. ∙

research

∙ 02/17/2020

Differentiable Bandit Exploration

We learn bandit policies that maximize the average reward over bandit in...

22 Craig Boutilier, et al. ∙

research

∙ 02/12/2020

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Deep Reinforcement Learning (RL) is proven powerful for decision making ...

11 Ge Liu, et al. ∙

research

∙ 02/08/2020

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned pol...

11 Sungryull Sohn, et al. ∙

research

∙ 11/20/2019

Gradient-based Optimization for Bayesian Preference Elicitation

Effective techniques for eliciting user preferences have taken on added ...

11 Ivan Vendrov, et al. ∙

research

∙ 09/26/2019

CAQL: Continuous Action Q-Learning

Value-based reinforcement learning (RL) methods like Q-learning have sho...

29 Moonkyung Ryu, et al. ∙

research

∙ 09/11/2019

RecSim: A Configurable Simulation Platform for Recommender Systems

We propose RecSim, a configurable platform for authoring simulation envi...

7 Eugene Ie, et al. ∙

research

∙ 06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...

2 Branislav Kveton, et al. ∙

research

∙ 05/29/2019

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Most practical recommender systems focus on estimating immediate user en...

1 Eugene Ie, et al. ∙

research

∙ 05/29/2019

Advantage Amplification in Slowly Evolving Latent-State Environments

Latent-state environments with long horizons, such as those faced by rec...

0 Martin Mladenov, et al. ∙

research

∙ 03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...

10 Branislav Kveton, et al. ∙

research

∙ 02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...

4 Branislav Kveton, et al. ∙

research

∙ 10/04/2018

Seq2Slate: Re-ranking and Slate Optimization with RNNs

Ranking is a central task in machine learning and information retrieval....

12 Irwan Bello, et al. ∙

research

∙ 05/07/2018

Planning and Learning with Stochastic Action Sets

In many practical uses of reinforcement learning (RL) the set of actions...

0 Craig Boutilier, et al. ∙

research

∙ 04/13/2013

Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000)

This is the Proceedings of the Sixteenth Conference on Uncertainty in Ar...

0 Craig Boutilier, et al. ∙

research

∙ 03/13/2013

Modal Logics for Qualitative Possibility and Beliefs

Possibilistic logic has been proposed as a numerical formalism for reaso...

0 Craig Boutilier, et al. ∙

research

∙ 03/06/2013

The Probability of a Possibility: Adding Uncertainty to Default Rules

We present a semantics for adding uncertainty to conditional logics for ...

0 Craig Boutilier, et al. ∙

research

∙ 02/27/2013

Integrating Planning and Execution in Stochastic Domains

We investigate planning in time-critical domains represented as Markov D...

0 Richard Dearden, et al. ∙

research

∙ 02/13/2013

Context-Specific Independence in Bayesian Networks

Bayesian networks provide a language for qualitatively representing the ...

0 Craig Boutilier, et al. ∙

research

∙ 02/06/2013

Structured Arc Reversal and Simulation of Dynamic Probabilistic Networks

We present an algorithm for arc reversal in Bayesian networks with tree-...

0 Adrian Y. W. Cheuk, et al. ∙

research

∙ 02/06/2013

Correlated Action Effects in Decision Theoretic Regression

Much recent research in decision theoretic planning has adopted Markov d...

0 Craig Boutilier, et al. ∙

research

∙ 01/30/2013

Hierarchical Solution of Markov Decision Processes using Macro-actions

We investigate the use of temporally abstract actions, or macro-actions,...

0 Milos Hauskrecht, et al. ∙

research

∙ 01/30/2013

Structured Reachability Analysis for Markov Decision Processes

Recent research in decision theoretic planning has focussed on making th...

0 Craig Boutilier, et al. ∙

research

∙ 01/23/2013

SPUDD: Stochastic Planning using Decision Diagrams

Markov decisions processes (MDPs) are becoming increasing popular as mod...

0 Jesse Hoey, et al. ∙

research

∙ 01/23/2013

Continuous Value Function Approximation for Sequential Bidding Policies

Market-based mechanisms such as auctions are being studied as an appropr...

0 Craig Boutilier, et al. ∙

research

∙ 01/23/2013

Reasoning With Conditional Ceteris Paribus Preference Statem

In many domains it is desirable to assess the preferences of users in a ...

0 Craig Boutilier, et al. ∙

research

∙ 01/16/2013

Value-Directed Belief State Approximation for POMDPs

We consider the problem belief-state monitoring for the purposes of impl...

0 Pascal Poupart, et al. ∙

research

∙ 01/16/2013

Approximately Optimal Monitoring of Plan Preconditions

Monitoring plan preconditions can allow for replanning when a preconditi...

0 Craig Boutilier, et al. ∙

research

∙ 01/10/2013

Value-Directed Sampling Methods for POMDPs

We consider the problem of approximate belief-state monitoring using par...

0 Pascal Poupart, et al. ∙

research

∙ 01/10/2013

Vector-space Analysis of Belief-state Approximation for POMDPs

We propose a new approach to value-directed belief state approximation f...

0 Pascal Poupart, et al. ∙

Craig Boutilier

Featured Co-authors

Sign in with Google

Consider DeepAI Pro