Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback

by   Carolin Lawrence, et al.
University of Heidelberg

Counterfactual learning from human bandit feedback describes a scenario where user feedback on the quality of outputs of a historic system is logged and used to improve a target system. We show how to apply this learning framework to neural semantic parsing. From a machine learning perspective, the key challenge lies in a proper reweighting of the estimator so as to avoid known degeneracies in counterfactual learning, while still being applicable to stochastic gradient optimization. To conduct experiments with human users, we devise an easy-to-use interface to collect human feedback on semantic parses. Our work is the first to show that semantic parsers can be improved significantly by counterfactual learning from logged human feedback data.


page 14

page 15


Counterfactual Learning from Human Proofreading Feedback for Semantic Parsing

In semantic parsing for question-answering, it is often too expensive to...

Counterfactual Learning for Machine Translation: Degeneracies and Solutions

Counterfactual learning is a natural scenario to improve web-based machi...

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch lea...

Bayesian Counterfactual Risk Minimization

We present a Bayesian view of counterfactual risk minimization (CRM), al...

Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning with Counterfactual Explanations

The capability to interactively learn from human feedback would enable r...

Counterfactual Learning from Logs for Improved Ranking of E-Commerce Products

Improved search quality enhances users' satisfaction, which directly imp...

Please sign up or login with your details

Forgot password? Click here to reset