Improving Opinion-based Question Answering Systems Through Label Error Detection and Overwrite

06/13/2023
by   Xiao Yang, et al.
0

Label error is a ubiquitous problem in annotated data. Large amounts of label error substantially degrades the quality of deep learning models. Existing methods to tackle the label error problem largely focus on the classification task, and either rely on task specific architecture or require non-trivial additional computations, which is undesirable or even unattainable for industry usage. In this paper, we propose LEDO: a model-agnostic and computationally efficient framework for Label Error Detection and Overwrite. LEDO is based on Monte Carlo Dropout combined with uncertainty metrics, and can be easily generalized to multiple tasks and data sets. Applying LEDO to an industry opinion-based question answering system demonstrates it is effective at improving accuracy in all the core models. Specifically, LEDO brings 1.1 gain for the retrieval model, 1.5 comprehension model, and 0.9 top of the strong baselines with a large-scale social media dataset. Importantly, LEDO is computationally efficient compared to methods that require loss function change, and cost-effective as the resulting data can be used in the same continuous training pipeline for production. Further analysis shows that these gains come from an improved decision boundary after cleaning the label errors existed in the training data.

READ FULL TEXT

page 3

page 4

research
10/24/2017

Using Multi-Label Classification for Improved Question Answering

A plethora of diverse approaches for question answering over RDF data ha...
research
11/27/2019

Label Dependent Deep Variational Paraphrase Generation

Generating paraphrases that are lexically similar but semantically diffe...
research
12/19/2022

Rethinking Label Smoothing on Multi-hop Question Answering

Label smoothing is a regularization technique widely used in supervised ...
research
06/02/2021

Knowing More About Questions Can Help: Improving Calibration in Question Answering

We study calibration in question answering, estimating whether model cor...
research
05/27/2021

Investigating label suggestions for opinion mining in German Covid-19 social media

This work investigates the use of interactively updated label suggestion...
research
07/07/2018

Robust and Scalable Differentiable Neural Computer for Question Answering

Deep learning models are often not easily adaptable to new tasks and req...

Please sign up or login with your details

Forgot password? Click here to reset