SciNLI: A Corpus for Natural Language Inference on Scientific Text

03/13/2022
by   Mobashir Sadat, et al.
0

Existing Natural Language Inference (NLI) datasets, while being instrumental in the advancement of Natural Language Understanding (NLU) research, are not related to scientific text. In this paper, we introduce SciNLI, a large dataset for NLI that captures the formality in scientific text and contains 107,412 sentence pairs extracted from scholarly papers on NLP and computational linguistics. Given that the text used in scientific literature differs vastly from the text used in everyday language both in terms of vocabulary and sentence structure, our dataset is well suited to serve as a benchmark for the evaluation of scientific NLU models. Our experiments show that SciNLI is harder to classify than the existing NLI datasets. Our best performing model with XLNet achieves a Macro F1 score of only 78.18 showing that there is substantial room for improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2022

CSL: A Large-scale Chinese Scientific Literature Dataset

Scientific literature serves as a high-quality corpus, supporting a lot ...
research
01/21/2020

AutoMATES: Automated Model Assembly from Text, Equations, and Software

Models of complicated systems can be represented in different ways - in ...
research
10/11/2017

Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

We propose Marve, a system for extracting measurement values, units, and...
research
05/23/2021

CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding

Scientific document understanding is challenging as the data is highly d...
research
07/20/2016

Constructing a Natural Language Inference Dataset using Generative Neural Networks

Natural Language Inference is an important task for Natural Language Und...
research
06/19/2023

Fine-Tuning Language Models for Scientific Writing Support

We support scientific writers in determining whether a written sentence ...
research
08/28/2019

Semantic Hypergraphs

Existing computational methods for the analysis of corpora of text in na...

Please sign up or login with your details

Forgot password? Click here to reset