Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback

05/03/2018
by   Carolin Lawrence, et al.
University of Heidelberg
0

Counterfactual learning from human bandit feedback describes a scenario where user feedback on the quality of outputs of a historic system is logged and used to improve a target system. We show how to apply this learning framework to neural semantic parsing. From a machine learning perspective, the key challenge lies in a proper reweighting of the estimator so as to avoid known degeneracies in counterfactual learning, while still being applicable to stochastic gradient optimization. To conduct experiments with human users, we devise an easy-to-use interface to collect human feedback on semantic parses. Our work is the first to show that semantic parsers can be improved significantly by counterfactual learning from logged human feedback data.

READ FULL TEXT

page 14

page 15

11/29/2018

Counterfactual Learning from Human Proofreading Feedback for Semantic Parsing

In semantic parsing for question-answering, it is often too expensive to...
11/23/2017

Counterfactual Learning for Machine Translation: Degeneracies and Solutions

Counterfactual learning is a natural scenario to improve web-based machi...
02/09/2015

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch lea...
06/29/2018

Bayesian Counterfactual Risk Minimization

We present a Bayesian view of counterfactual risk minimization (CRM), al...
08/03/2021

Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning with Counterfactual Explanations

The capability to interactively learn from human feedback would enable r...
07/24/2019

Counterfactual Learning from Logs for Improved Ranking of E-Commerce Products

Improved search quality enhances users' satisfaction, which directly imp...

Please sign up or login with your details

Forgot password? Click here to reset