Will it Unblend?

09/18/2020
by   Yuval Pinter, et al.
12

Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as "innoventor", are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual language models such as BERT. We first show that BERT's processing of these blends does not fully access the component meanings, leaving their contextual representations semantically impoverished. We find this is mostly due to the loss of characters resulting from blend formation. Then, we assess how easily different models can recognize the structure and recover the origin of blends, and find that context-aware embedding systems outperform character-level and context-free embeddings, although their results are still far from satisfactory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2022

Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models

Natural language processing models learn word representations based on t...
research
09/17/2021

New Students on Sesame Street: What Order-Aware Matrix Embeddings Can Learn from BERT

Large-scale pretrained language models (PreLMs) are revolutionizing natu...
research
03/16/2020

A Survey on Contextual Embeddings

Contextual embeddings, such as ELMo and BERT, move beyond global word re...
research
04/25/2023

What does BERT learn about prosody?

Language models have become nearly ubiquitous in natural language proces...
research
06/13/2022

Transition-based Abstract Meaning Representation Parsing with Contextual Embeddings

The ability to understand and generate languages sets human cognition ap...
research
01/10/2021

BERT Family Eat Word Salad: Experiments with Text Understanding

In this paper, we study the response of large models from the BERT famil...

Please sign up or login with your details

Forgot password? Click here to reset