Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

09/10/2021
by   Shane Storks, et al.
15

As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines' predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

Large-scale, pre-trained language models (LMs) have achieved human-level...
research
10/29/2020

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

In this paper, we introduce an advanced Russian general language underst...
research
01/12/2022

How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

A central question in natural language understanding (NLU) research is w...
research
01/12/2022

PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics

In order for language models to aid physics research, they must first en...
research
04/14/2017

ShapeWorld - A new test methodology for multimodal language understanding

We introduce a novel framework for evaluating multimodal deep learning m...
research
03/27/2023

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

The performance of abstractive text summarization has been greatly boost...
research
04/07/2020

Evaluating Machines by their Real-World Language Use

There is a fundamental gap between how humans understand and use languag...

Please sign up or login with your details

Forgot password? Click here to reset