
-
Near-optimal Representation Learning for Linear Bandits and Linear RL
This paper studies representation learning for multi-task linear bandits...
read it
-
CoinDICE: Off-Policy Confidence Interval Estimation
We study high-confidence behavior-agnostic off-policy evaluation in rein...
read it
-
Neural Thompson Sampling
Thompson Sampling (TS) is one of the most effective algorithms for solvi...
read it
-
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL
Reinforcement learning (RL) in episodic, factored Markov decision proces...
read it
-
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
Off-policy evaluation (OPE) in reinforcement learning is an important pr...
read it
-
Off-Policy Evaluation via the Regularized Lagrangian
The recently proposed distribution correction estimation (DICE) family o...
read it
-
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
Off-policy estimation for long-horizon problems is important in many rea...
read it
-
Batch Stationary Distribution Estimation
We consider the problem of approximating the stationary distribution of ...
read it
-
GenDICE: Generalized Offline Estimation of Stationary Values
An important problem that arises in reinforcement learning and Monte Car...
read it
-
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Deep Reinforcement Learning (RL) is proven powerful for decision making ...
read it
-
AlgaeDICE: Policy Gradient from Arbitrary Experience
In many real-world applications of reinforcement learning (RL), interact...
read it
-
Neural Contextual Bandits with Upper Confidence Bound-Based Exploration
We study the stochastic contextual bandit problem, where the reward is g...
read it
-
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Infinite horizon off-policy policy evaluation is a highly challenging ta...
read it
-
Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLM-T...
read it
-
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
In many real-world reinforcement learning applications, access to the en...
read it
-
A Kernel Loss for Solving the Bellman Equation
Value function learning plays a central role in many state-of-the-art re...
read it
-
Neural Logic Machines
We propose the Neural Logic Machine (NLM), a neural-symbolic architectur...
read it
-
Policy Certificates: Towards Accountable Reinforcement Learning
The performance of a reinforcement learning algorithm can vary drastical...
read it
-
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
We consider the off-policy estimation problem of estimating the expected...
read it
-
Adversarial Attacks on Stochastic Bandits
We study adversarial attacks that manipulate the reward signals to contr...
read it
-
Neural Approaches to Conversational AI
The present paper surveys neural approaches to conversational AI that ha...
read it
-
Data Poisoning Attacks in Contextual Bandits
We study offline data poisoning attacks in contextual bandits, a class o...
read it
-
Scalable Bilinear π Learning Using State and Action Features
Approximate linear programming (ALP) represents one of the major algorit...
read it
-
Subgoal Discovery for Hierarchical Dialogue Policy Learning
Developing conversational agents to engage in complex dialogues is chall...
read it
-
Smoothed Dual Embedding Control
We revisit the Bellman optimality equation with Nesterov's smoothing tec...
read it
-
Boosting the Actor with Dual Critic
This paper proposes a new actor-critic-style algorithm called Dual Actor...
read it
-
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
We present a new algorithm that significantly improves the efficiency of...
read it
-
Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning
Building a dialogue agent to fulfill complex tasks, such as travel plann...
read it
-
Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems
Language understanding is a key component in a spoken dialogue system. I...
read it
-
End-to-End Task-Completion Neural Dialogue Systems
One of the major drawbacks of modularized task-completion dialogue syste...
read it
-
Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Contextual bandits are widely used in Internet services from news recomm...
read it
-
Scaffolding Networks: Incremental Learning and Teaching Through Questioning
We introduce a new paradigm of learning for reasoning, understanding, an...
read it
-
Stochastic Variance Reduction Methods for Policy Evaluation
Policy evaluation is a crucial step in many reinforcement-learning proce...
read it
-
A User Simulator for Task-Completion Dialogues
Despite widespread interests in reinforcement-learning for task-oriented...
read it
-
Neuro-Symbolic Program Synthesis
Recent years have seen the proposal of a number of neural architectures ...
read it
-
Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear
To use deep reinforcement learning in the wild, we might hope for an age...
read it
-
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access
This paper proposes KB-InfoBot -- a multi-turn dialogue agent which help...
read it
-
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads
We introduce an online popularity prediction and tracking task as a benc...
read it
-
Deep Reinforcement Learning with a Natural Language Action Space
This paper introduces a novel architecture for reinforcement learning wi...
read it
-
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
We study the problem of off-policy value evaluation in reinforcement lea...
read it
-
Recurrent Reinforcement Learning: A Hybrid Approach
Successful applications of reinforcement learning in real-world problems...
read it
-
The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning
Transferring knowledge across a sequence of related tasks is an importan...
read it
-
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic ba...
read it
-
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives
We consider a contextual version of multi-armed bandit problem with glob...
read it
-
Doubly Robust Policy Evaluation and Optimization
We study sequential decision making in environments where rewards are on...
read it
-
On Minimax Optimal Offline Policy Evaluation
This paper studies the off-policy evaluation problem, where one aims to ...
read it
-
Counterfactual Estimation and Optimization of Click Metrics for Search Engines
Optimizing an interactive system against a predefined online metric is p...
read it
-
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, w...
read it
-
Sample Complexity of Multi-task Reinforcement Learning
Transferring knowledge across a sequence of reinforcement-learning tasks...
read it
-
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits
We present and prove properties of a new offline policy evaluator for an...
read it