A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

04/18/2017
by   Adina Williams, et al.
0

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English--making it possible to evaluate systems on nearly the full complexity of the language--and it offers an explicit setting for the evaluation of cross-genre domain adaptation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

A Corpus for Sentence-level Subjectivity Detection on English News Articles

We present a novel corpus for subjectivity detection at the sentence lev...
research
05/07/2020

The Danish Gigaword Project

Danish is a North Germanic/Scandinavian language spoken primarily in Den...
research
08/21/2015

A large annotated corpus for learning natural language inference

Understanding entailment and contradiction is fundamental to understandi...
research
09/18/2020

The birth of Romanian BERT

Large-scale pretrained language models have become ubiquitous in Natural...
research
08/04/2017

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference

The RepEval 2017 Shared Task aims to evaluate natural language understan...
research
05/22/2019

Sentence Length

The distribution of sentence length in ordinary language is not well cap...
research
08/19/2018

Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language

To deploy a spoken language understanding (SLU) model to a new language,...

Please sign up or login with your details

Forgot password? Click here to reset