Mining Knowledge for Natural Language Inference from Wikipedia Categories

10/03/2020
by   Mingda Chen, et al.
0

Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WikiNLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baselines such as BERT and RoBERTa by pretraining them on WikiNLI and transferring the models on downstream tasks. We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WikiNLI gives the best performance. In addition, we construct WikiNLI in other languages, and show that pretraining on them improves performance on NLI tasks of corresponding languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2021

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

There have been various types of pretraining architectures including aut...
research
03/19/2022

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

We investigate what kind of structural knowledge learned in neural netwo...
research
04/27/2019

Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference

Natural language inference (NLI) is among the most challenging tasks in ...
research
12/04/2022

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

This technical report briefly describes our JDExplore d-team's Vega v2 s...
research
05/17/2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

The mixture proportions of pretraining data domains (e.g., Wikipedia, bo...
research
11/02/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

While state-of-the-art models that rely upon massively multilingual pret...
research
10/20/2020

Natural Language Inference with Mixed Effects

There is growing evidence that the prevalence of disagreement in the raw...

Please sign up or login with your details

Forgot password? Click here to reset