Can NLP Models 'Identify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?

09/08/2023
by   Ayushi Agarwal, et al.
0

Though state-of-the-art (SOTA) NLP systems have achieved remarkable performance on a variety of language understanding tasks, they primarily focus on questions that have a correct and a definitive answer. However, in real-world applications, users often ask questions that don't have a definitive answer. Incorrectly answering such questions certainly hampers a system's reliability and trustworthiness. Can SOTA models accurately identify such questions and provide a reasonable response? To investigate the above question, we introduce QnotA, a dataset consisting of five different categories of questions that don't have definitive answers. Furthermore, for each QnotA instance, we also provide a corresponding QA instance i.e. an alternate question that ”can be” answered. With this data, we formulate three evaluation tasks that test a system's ability to 'identify', 'distinguish', and 'justify' QnotA questions. Through comprehensive experiments, we show that even SOTA models including GPT-3 and Flan T5 do not fare well on these tasks and lack considerably behind the human performance baseline. We conduct a thorough analysis which further leads to several interesting findings. Overall, we believe our work and findings will encourage and facilitate further research in this important area and help develop more robust models.

READ FULL TEXT
research
04/29/2022

Answer Consolidation: Formulation and Benchmarking

Current question answering (QA) systems primarily consider the single-an...
research
05/02/2023

Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances in QA

Despite remarkable progress made in natural language processing, even th...
research
03/03/2014

Representing, reasoning and answering questions about biological pathways - various applications

Biological organisms are composed of numerous interconnected biochemical...
research
05/20/2023

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?

Pre-training on large corpora of text enables the language models to acq...
research
03/07/2022

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Knowledge of questions' difficulty level helps a teacher in several ways...
research
08/01/2020

An Empirical Study of Clarifying Question-Based Systems

Search and recommender systems that take the initiative to ask clarifyin...
research
04/05/2021

What's the best place for an AI conference, Vancouver or ______: Why completing comparative questions is difficult

Although large neural language models (LMs) like BERT can be finetuned t...

Please sign up or login with your details

Forgot password? Click here to reset