
Thompson Sampling with a Mixture Prior
We study Thompson sampling (TS) in online decisionmaking problems where...
read it

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems
The development of recommender systems that optimize multiturn interact...
read it

MetaThompson Sampling
Efficient exploration in multiarmed bandits is a fundamental online lea...
read it

NonStationary Latent Bandits
Users of recommender systems often behave in a nonstationary fashion, d...
read it

Optimizing Longterm Social Welfare in Recommender Systems: A Constrained Matching Approach
Most recommender systems (RS) research assumes that a user's utility can...
read it

Latent Bandits Revisited
A latent bandit problem is one in which the learning agent knows the arm...
read it

Differentiable MetaLearning in Contextual Bandits
We study a contextual bandit setting where the learning agent has access...
read it

ConQUR: Mitigating Delusional Bias in Deep Qlearning
Delusional bias is a fundamental source of error in approximate Qlearni...
read it

Differentiable Bandit Exploration
We learn bandit policies that maximize the average reward over bandit in...
read it

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Deep Reinforcement Learning (RL) is proven powerful for decision making ...
read it

BRPO: Batch Residual Policy Optimization
In batch reinforcement learning (RL), one often constrains a learned pol...
read it

Gradientbased Optimization for Bayesian Preference Elicitation
Effective techniques for eliciting user preferences have taken on added ...
read it

CAQL: Continuous Action QLearning
Valuebased reinforcement learning (RL) methods like Qlearning have sho...
read it

RecSim: A Configurable Simulation Platform for Recommender Systems
We propose RecSim, a configurable platform for authoring simulation envi...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

Reinforcement Learning for Slatebased Recommender Systems: A Tractable Decomposition and Practical Methodology
Most practical recommender systems focus on estimating immediate user en...
read it

Advantage Amplification in Slowly Evolving LatentState Environments
Latentstate environments with long horizons, such as those faced by rec...
read it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it

Seq2Slate: Reranking and Slate Optimization with RNNs
Ranking is a central task in machine learning and information retrieval....
read it

Planning and Learning with Stochastic Action Sets
In many practical uses of reinforcement learning (RL) the set of actions...
read it

Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000)
This is the Proceedings of the Sixteenth Conference on Uncertainty in Ar...
read it

Modal Logics for Qualitative Possibility and Beliefs
Possibilistic logic has been proposed as a numerical formalism for reaso...
read it

The Probability of a Possibility: Adding Uncertainty to Default Rules
We present a semantics for adding uncertainty to conditional logics for ...
read it

Integrating Planning and Execution in Stochastic Domains
We investigate planning in timecritical domains represented as Markov D...
read it

ContextSpecific Independence in Bayesian Networks
Bayesian networks provide a language for qualitatively representing the ...
read it

Structured Arc Reversal and Simulation of Dynamic Probabilistic Networks
We present an algorithm for arc reversal in Bayesian networks with tree...
read it

Correlated Action Effects in Decision Theoretic Regression
Much recent research in decision theoretic planning has adopted Markov d...
read it

Hierarchical Solution of Markov Decision Processes using Macroactions
We investigate the use of temporally abstract actions, or macroactions,...
read it

Structured Reachability Analysis for Markov Decision Processes
Recent research in decision theoretic planning has focussed on making th...
read it

SPUDD: Stochastic Planning using Decision Diagrams
Markov decisions processes (MDPs) are becoming increasing popular as mod...
read it

Continuous Value Function Approximation for Sequential Bidding Policies
Marketbased mechanisms such as auctions are being studied as an appropr...
read it

Reasoning With Conditional Ceteris Paribus Preference Statem
In many domains it is desirable to assess the preferences of users in a ...
read it

ValueDirected Belief State Approximation for POMDPs
We consider the problem beliefstate monitoring for the purposes of impl...
read it

Approximately Optimal Monitoring of Plan Preconditions
Monitoring plan preconditions can allow for replanning when a preconditi...
read it

ValueDirected Sampling Methods for POMDPs
We consider the problem of approximate beliefstate monitoring using par...
read it

Vectorspace Analysis of Beliefstate Approximation for POMDPs
We propose a new approach to valuedirected belief state approximation f...
read it

UCPNetworks: A Directed Graphical Representation of Conditional Utilities
We propose a new directed graphical representation of utility functions,...
read it

Active Collaborative Filtering
Collaborative filtering (CF) allows the preferences of multiple users to...
read it

Approximate Linear Programming for Firstorder MDPs
We introduce a new approximate solution technique for firstorder Markov...
read it

Local Utility Elicitation in GAI Models
Structured utility models are essential for the effective representation...
read it

Active Learning for Matching Problems
Effective learning of user preferences is critical to easing user burden...
read it

Toward Experiential Utility Elicitation for Interface Customization
User preferences for automated assistance often vary widely, depending o...
read it

Regretbased Reward Elicitation for Markov Decision Processes
The specification of aMarkov decision process (MDP) can be difficult. Re...
read it

A Framework for Optimizing Paper Matching
At the heart of many scientific conferences is the problem of matching s...
read it

Eliciting Forecasts from Selfinterested Experts: Scoring Rules for Decision Makers
Scoring rules for eliciting expert predictions of random variables are u...
read it
Craig Boutilier
is this you? claim profile
Principal Scientist at Google & Professor at University of Toronto