Lihong Li

research

∙ 07/16/2023

MESOB: Balancing Equilibria Social Optimality

Motivated by bid recommendation in online ad auctions, this paper consid...

0 Xin Guo, et al. ∙

research

∙ 12/29/2022

Offline Policy Optimization in RL with Variance Regularizaton

Learning policies from fixed offline datasets is a key challenge to scal...

0 Riashat Islam, et al. ∙

research

∙ 10/14/2022

A Reinforcement Learning Approach to Estimating Long-term Treatment Effects

Randomized experiments (a.k.a. A/B tests) are a powerful tool for estima...

0 Ziyang Tang, et al. ∙

research

∙ 10/07/2021

Understanding Domain Randomization for Sim-to-real Transfer

Reinforcement learning encounters many challenges when applied directly ...

0 Xiaoyu Chen, et al. ∙

research

∙ 07/01/2021

A Map of Bandits for E-commerce

The rich body of Bandit literature not only offers a diverse toolbox of ...

5 YI LIU, et al. ∙

research

∙ 04/06/2021

On the Optimality of Batch Policy Optimization Algorithms

Batch policy optimization considers leveraging existing data for policy ...

0 Chenjun Xiao, et al. ∙

research

∙ 02/08/2021

Near-optimal Representation Learning for Linear Bandits and Linear RL

This paper studies representation learning for multi-task linear bandits...

0 Jiachen Hu, et al. ∙

research

∙ 10/22/2020

CoinDICE: Off-Policy Confidence Interval Estimation

We study high-confidence behavior-agnostic off-policy evaluation in rein...

0 Bo Dai, et al. ∙

research

∙ 10/02/2020

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solvi...

7 Weitong Zhang, et al. ∙

research

∙ 08/31/2020

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Reinforcement learning (RL) in episodic, factored Markov decision proces...

9 Xiaoyu Chen, et al. ∙

research

∙ 07/27/2020

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Off-policy evaluation (OPE) in reinforcement learning is an important pr...

5 Andrew Bennett, et al. ∙

research

∙ 07/07/2020

Off-Policy Evaluation via the Regularized Lagrangian

The recently proposed distribution correction estimation (DICE) family o...

4 Mengjiao Yang, et al. ∙

research

∙ 03/24/2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

Off-policy estimation for long-horizon problems is important in many rea...

7 Ali Mousavi, et al. ∙

research

∙ 03/02/2020

Batch Stationary Distribution Estimation

We consider the problem of approximating the stationary distribution of ...

7 Junfeng Wen, et al. ∙

research

∙ 02/21/2020

GenDICE: Generalized Offline Estimation of Stationary Values

An important problem that arises in reinforcement learning and Monte Car...

11 Ruiyi Zhang, et al. ∙

research

∙ 02/12/2020

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Deep Reinforcement Learning (RL) is proven powerful for decision making ...

11 Ge Liu, et al. ∙

research

∙ 12/04/2019

AlgaeDICE: Policy Gradient from Arbitrary Experience

In many real-world applications of reinforcement learning (RL), interact...

0 Ofir Nachum, et al. ∙

research

∙ 11/11/2019

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

We study the stochastic contextual bandit problem, where the reward is g...

16 Dongruo Zhou, et al. ∙

research

∙ 10/16/2019

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Infinite horizon off-policy policy evaluation is a highly challenging ta...

0 Ziyang Tang, et al. ∙

research

∙ 06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...

2 Branislav Kveton, et al. ∙

research

∙ 06/10/2019

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

In many real-world reinforcement learning applications, access to the en...

0 Ofir Nachum, et al. ∙

research

∙ 05/25/2019

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art re...

0 Yihao Feng, et al. ∙

research

∙ 04/26/2019

Neural Logic Machines

We propose the Neural Logic Machine (NLM), a neural-symbolic architectur...

0 Honghua Dong, et al. ∙

research

∙ 11/07/2018

Policy Certificates: Towards Accountable Reinforcement Learning

The performance of a reinforcement learning algorithm can vary drastical...

0 Christoph Dann, et al. ∙

research

∙ 10/29/2018

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

We consider the off-policy estimation problem of estimating the expected...

0 Qiang Liu, et al. ∙

research

∙ 10/29/2018

Adversarial Attacks on Stochastic Bandits

We study adversarial attacks that manipulate the reward signals to contr...

0 Kwang-Sung Jun, et al. ∙

research

∙ 09/21/2018

Neural Approaches to Conversational AI

The present paper surveys neural approaches to conversational AI that ha...

0 Jianfeng Gao, et al. ∙

research

∙ 08/17/2018

Data Poisoning Attacks in Contextual Bandits

We study offline data poisoning attacks in contextual bandits, a class o...

0 Yuzhe Ma, et al. ∙

research

∙ 04/27/2018

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...

0 Yichen Chen, et al. ∙

research

∙ 04/20/2018

Subgoal Discovery for Hierarchical Dialogue Policy Learning

Developing conversational agents to engage in complex dialogues is chall...

0 Da Tang, et al. ∙

research

∙ 12/29/2017

Smoothed Dual Embedding Control

We revisit the Bellman optimality equation with Nesterov's smoothing tec...

0 Bo Dai, et al. ∙

research

∙ 12/29/2017

Boosting the Actor with Dual Critic

This paper proposes a new actor-critic-style algorithm called Dual Actor...

0 Bo Dai, et al. ∙

research

∙ 11/15/2017

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of...

0 Zachary Lipton, et al. ∙

research

∙ 04/10/2017

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

Building a dialogue agent to fulfill complex tasks, such as travel plann...

0 Baolin Peng, et al. ∙

research

∙ 03/21/2017

Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

Language understanding is a key component in a spoken dialogue system. I...

0 Xiujun Li, et al. ∙

research

∙ 03/03/2017

End-to-End Task-Completion Neural Dialogue Systems

One of the major drawbacks of modularized task-completion dialogue syste...

0 Xiujun Li, et al. ∙

research

∙ 02/28/2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Contextual bandits are widely used in Internet services from news recomm...

0 Lihong Li, et al. ∙

research

∙ 02/28/2017

Scaffolding Networks: Incremental Learning and Teaching Through Questioning

We introduce a new paradigm of learning for reasoning, understanding, an...

0 Asli Celikyilmaz, et al. ∙

research

∙ 02/25/2017

Stochastic Variance Reduction Methods for Policy Evaluation

Policy evaluation is a crucial step in many reinforcement-learning proce...

0 Simon S. Du, et al. ∙

research

∙ 12/17/2016

A User Simulator for Task-Completion Dialogues

Despite widespread interests in reinforcement-learning for task-oriented...

0 Xiujun Li, et al. ∙

research

∙ 11/06/2016

Neuro-Symbolic Program Synthesis

Recent years have seen the proposal of a number of neural architectures ...

0 Emilio Parisotto, et al. ∙

research

∙ 11/03/2016

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

To use deep reinforcement learning in the wild, we might hope for an age...

0 Zachary C Lipton, et al. ∙

research

∙ 09/03/2016

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

This paper proposes KB-InfoBot -- a multi-turn dialogue agent which help...

0 Bhuwan Dhingra, et al. ∙

research

∙ 06/12/2016

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

We introduce an online popularity prediction and tracking task as a benc...

0 Ji He, et al. ∙

research

∙ 11/14/2015

Deep Reinforcement Learning with a Natural Language Action Space

This paper introduces a novel architecture for reinforcement learning wi...

0 Ji He, et al. ∙

research

∙ 11/11/2015

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement lea...

0 Nan Jiang, et al. ∙

research

∙ 09/10/2015

Recurrent Reinforcement Learning: A Hybrid Approach

Successful applications of reinforcement learning in real-world problems...

0 Xiujun Li, et al. ∙

research

∙ 06/10/2015

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

Transferring knowledge across a sequence of related tasks is an importan...

0 Emma Brunskill, et al. ∙

research

∙ 06/10/2015

On the Prior Sensitivity of Thompson Sampling

The empirically successful Thompson Sampling algorithm for stochastic ba...

0 Che-Yu Liu, et al. ∙

research

∙ 06/10/2015

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

We consider a contextual version of multi-armed bandit problem with glob...

0 Shipra Agrawal, et al. ∙

Lihong Li

Featured Co-authors

Sign in with Google

Consider DeepAI Pro