ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

03/07/2022
by   Neeraj Varshney, et al.
2

Knowledge of questions' difficulty level helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in NLP? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these applications result in several interesting findings, such as evaluation using just 5 (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2 performance. We release the difficulty scores and hope our analyses and findings will bring more attention to this important yet understudied field of leveraging instance difficulty in evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2022

Let the Model Decide its Curriculum for Multitask Learning

Curriculum learning strategies in prior multi-task learning approaches a...
research
09/08/2023

Can NLP Models 'Identify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?

Though state-of-the-art (SOTA) NLP systems have achieved remarkable perf...
research
07/20/2023

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Evaluation of Large Language Models (LLMs) is challenging because aligni...
research
08/14/2022

Text Difficulty Study: Do machines behave the same as humans regarding text difficulty?

Given a task, human learns from easy to hard, whereas the model learns r...
research
07/04/2016

Modeling of Item-Difficulty for Ontology-based MCQs

Multiple choice questions (MCQs) that can be generated from a domain ont...
research
09/16/2016

Grammatical Templates: Improving Text Difficulty Evaluation for Language Learners

Language students are most engaged while reading texts at an appropriate...
research
05/05/2023

Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

Defect prediction is crucial for software quality assurance and has been...

Please sign up or login with your details

Forgot password? Click here to reset