Evaluating Compositionality in Sentence Embeddings

02/12/2018
by   Ishita Dasgupta, et al.
0

An important frontier in the quest for human-like AI is compositional semantics: how do we design systems that understand an infinite number of expressions built from a finite vocabulary? Recent research has attempted to solve this problem by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to supervised learning problems like paraphrase detection and sentiment analysis. Here we focus on 'natural language inference' (NLI) as a critical test of a system's capacity for semantic compositionality. In the NLI task, sentence pairs are assigned one of three categories: entailment, contradiction, or neutral. We present a new set of NLI sentence pairs that cannot be solved using only word-level knowledge and instead require some degree of compositionality. We use state of the art sentence embeddings trained on NLI (InferSent, Conneau et al. (2017)), and find that performance on our new dataset is poor, indicating that the representations learned by this model fail to capture the needed compositionality. We analyze some of the decision rules learned by InferSent and find that they are largely driven by simple heuristics at the word level that are ecologically valid in the SNLI dataset on which InferSent is trained. Further, we find that augmenting the training dataset with our new dataset improves performance on a held-out test set without loss of performance on the SNLI test set. This highlights the importance of structured datasets in better understanding, as well as improving the performance of, AI systems.

READ FULL TEXT

page 3

page 4

research
10/20/2018

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

Reasoning about implied relationships (e.g. paraphrastic, common sense, ...
research
03/27/2019

Learning semantic sentence representations from visually grounded language without lexical knowledge

Current approaches to learning semantic representations of sentences oft...
research
11/25/2015

Towards Universal Paraphrastic Sentence Embeddings

We consider the problem of learning general-purpose, paraphrastic senten...
research
05/16/2016

Joint Learning of Sentence Embeddings for Relevance and Entailment

We consider the problem of Recognizing Textual Entailment within an Info...
research
08/07/2017

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference

We present a simple sequential sentence encoder for multi-domain natural...
research
06/03/2018

Learning Semantic Sentence Embeddings using Pair-wise Discriminator

In this paper, we propose a method for obtaining sentence-level embeddin...
research
04/01/2019

PAWS: Paraphrase Adversaries from Word Scrambling

Existing paraphrase identification datasets lack sentence pairs that hav...

Please sign up or login with your details

Forgot password? Click here to reset