Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment

04/22/2018
by   Masatoshi Tsuchiya, et al.
0

The quality of training data is one of the crucial problems when a learning-centered approach is employed. This paper proposes a new method to investigate the quality of a large corpus designed for the recognizing textual entailment (RTE) task. The proposed method, which is inspired by a statistical hypothesis test, consists of two phases: the first phase is to introduce the predictability of textual entailment labels as a null hypothesis which is extremely unacceptable if a target corpus has no hidden bias, and the second phase is to test the null hypothesis using a Naive Bayes model. The experimental result of the Stanford Natural Language Inference (SNLI) corpus does not reject the null hypothesis. Therefore, it indicates that the SNLI corpus has a hidden bias which allows prediction of textual entailment labels from hypothesis sentences even if no context information is given by a premise sentence. This paper also presents the performance impact of NN models for RTE caused by this hidden bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2021

Exploring Lexical Irregularities in Hypothesis-Only Models of Natural Language Inference

Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE)...
research
12/14/2018

A corpus of precise natural textual entailment problems

In this paper, we present a new corpus of entailment problems. This corp...
research
03/20/2022

Entailment Relation Aware Paraphrase Generation

We introduce a new task of entailment relation aware paraphrase generati...
research
05/25/2017

Max-Cosine Matching Based Neural Models for Recognizing Textual Entailment

Recognizing textual entailment is a fundamental task in a variety of tex...
research
04/07/2020

e-SNLI-VE-2.0: Corrected Visual-Textual Entailment with Natural Language Explanations

The recently proposed SNLI-VE corpus for recognising visual-textual enta...
research
06/02/2021

Figurative Language in Recognizing Textual Entailment

We introduce a collection of recognizing textual entailment (RTE) datase...
research
11/07/2022

Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference

It has been shown that NLI models are usually biased with respect to the...

Please sign up or login with your details

Forgot password? Click here to reset