
Offpolicy Evaluation in InfiniteHorizon Reinforcement Learning with Latent Confounders
Offpolicy evaluation (OPE) in reinforcement learning is an important pr...
read it

OffPolicy Evaluation via the Regularized Lagrangian
The recently proposed distribution correction estimation (DICE) family o...
read it

Blackbox Offpolicy Estimation for InfiniteHorizon Reinforcement Learning
Offpolicy estimation for longhorizon problems is important in many rea...
read it

Batch Stationary Distribution Estimation
We consider the problem of approximating the stationary distribution of ...
read it

GenDICE: Generalized Offline Estimation of Stationary Values
An important problem that arises in reinforcement learning and Monte Car...
read it

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Deep Reinforcement Learning (RL) is proven powerful for decision making ...
read it

AlgaeDICE: Policy Gradient from Arbitrary Experience
In many realworld applications of reinforcement learning (RL), interact...
read it

Neural Contextual Bandits with Upper Confidence BoundBased Exploration
We study the stochastic contextual bandit problem, where the reward is g...
read it

Doubly Robust Bias Reduction in Infinite Horizon OffPolicy Estimation
Infinite horizon offpolicy policy evaluation is a highly challenging ta...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

DualDICE: BehaviorAgnostic Estimation of Discounted Stationary Distribution Corrections
In many realworld reinforcement learning applications, access to the en...
read it

A Kernel Loss for Solving the Bellman Equation
Value function learning plays a central role in many stateoftheart re...
read it

Neural Logic Machines
We propose the Neural Logic Machine (NLM), a neuralsymbolic architectur...
read it

Policy Certificates: Towards Accountable Reinforcement Learning
The performance of a reinforcement learning algorithm can vary drastical...
read it

Breaking the Curse of Horizon: InfiniteHorizon OffPolicy Estimation
We consider the offpolicy estimation problem of estimating the expected...
read it

Adversarial Attacks on Stochastic Bandits
We study adversarial attacks that manipulate the reward signals to contr...
read it

Neural Approaches to Conversational AI
The present paper surveys neural approaches to conversational AI that ha...
read it

Data Poisoning Attacks in Contextual Bandits
We study offline data poisoning attacks in contextual bandits, a class o...
read it

Scalable Bilinear π Learning Using State and Action Features
Approximate linear programming (ALP) represents one of the major algorit...
read it

Subgoal Discovery for Hierarchical Dialogue Policy Learning
Developing conversational agents to engage in complex dialogues is chall...
read it

Smoothed Dual Embedding Control
We revisit the Bellman optimality equation with Nesterov's smoothing tec...
read it

Boosting the Actor with Dual Critic
This paper proposes a new actorcriticstyle algorithm called Dual Actor...
read it

BBQNetworks: Efficient Exploration in Deep Reinforcement Learning for TaskOriented Dialogue Systems
We present a new algorithm that significantly improves the efficiency of...
read it

Composite TaskCompletion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning
Building a dialogue agent to fulfill complex tasks, such as travel plann...
read it

Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems
Language understanding is a key component in a spoken dialogue system. I...
read it

EndtoEnd TaskCompletion Neural Dialogue Systems
One of the major drawbacks of modularized taskcompletion dialogue syste...
read it

Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Contextual bandits are widely used in Internet services from news recomm...
read it

Scaffolding Networks: Incremental Learning and Teaching Through Questioning
We introduce a new paradigm of learning for reasoning, understanding, an...
read it

Stochastic Variance Reduction Methods for Policy Evaluation
Policy evaluation is a crucial step in many reinforcementlearning proce...
read it

A User Simulator for TaskCompletion Dialogues
Despite widespread interests in reinforcementlearning for taskoriented...
read it

NeuroSymbolic Program Synthesis
Recent years have seen the proposal of a number of neural architectures ...
read it

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear
To use deep reinforcement learning in the wild, we might hope for an age...
read it

Towards EndtoEnd Reinforcement Learning of Dialogue Agents for Information Access
This paper proposes KBInfoBot  a multiturn dialogue agent which help...
read it

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads
We introduce an online popularity prediction and tracking task as a benc...
read it

Deep Reinforcement Learning with a Natural Language Action Space
This paper introduces a novel architecture for reinforcement learning wi...
read it

Doubly Robust Offpolicy Value Evaluation for Reinforcement Learning
We study the problem of offpolicy value evaluation in reinforcement lea...
read it

Recurrent Reinforcement Learning: A Hybrid Approach
Successful applications of reinforcement learning in realworld problems...
read it

The Online CouponCollector Problem and Its Application to Lifelong Reinforcement Learning
Transferring knowledge across a sequence of related tasks is an importan...
read it

On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic ba...
read it

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives
We consider a contextual version of multiarmed bandit problem with glob...
read it

Doubly Robust Policy Evaluation and Optimization
We study sequential decision making in environments where rewards are on...
read it

On Minimax Optimal Offline Policy Evaluation
This paper studies the offpolicy evaluation problem, where one aims to ...
read it

Counterfactual Estimation and Optimization of Click Metrics for Search Engines
Optimizing an interactive system against a predefined online metric is p...
read it

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, w...
read it

Sample Complexity of Multitask Reinforcement Learning
Transferring knowledge across a sequence of reinforcementlearning tasks...
read it

Sampleefficient Nonstationary Policy Evaluation for Contextual Bandits
We present and prove properties of a new offline policy evaluator for an...
read it

Incremental Modelbased Learners With Formal LearningTime Guarantees
Modelbased learning algorithms have been shown to use experience effici...
read it

Doubly Robust Policy Evaluation and Learning
We study decision making in environments where the reward is only partia...
read it

Refining Recency Search Results with User Click Feedback
Traditional machinelearned ranking systems for web search are often tra...
read it

Unbiased Offline Evaluation of Contextualbanditbased News Article Recommendation Algorithms
Contextual bandit algorithms have become popular for online recommendati...
read it