Unsupervised Lexical Substitution with Decontextualised Embeddings

09/17/2022
by   Takashi Wada, et al.
0

We propose a new unsupervised method for lexical substitution using pre-trained language models. Compared to previous approaches that use the generative capability of language models to predict substitutes, our method retrieves substitutes based on the similarity of contextualised and decontextualised word embeddings, i.e. the average contextual representation of a word in multiple contexts. We conduct experiments in English and Italian, and show that our method substantially outperforms strong baselines and establishes a new state-of-the-art without any explicit supervision or fine-tuning. We further show that our method performs particularly well at predicting low-frequency substitutes, and also generates a diverse list of substitute candidates, reducing morphophonetic or morphosyntactic biases induced by article-noun agreement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2021

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models

Sense representations have gone beyond word representations like Word2Ve...
research
12/10/2020

As good as new. How to successfully recycle English GPT-2 to make models for other languages

Large generative language models have been very successful for English, ...
research
09/30/2020

Multiple Word Embeddings for Increased Diversity of Representation

Most state-of-the-art models in natural language processing (NLP) are ne...
research
06/04/2022

Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

In most cases, word embeddings are learned only from raw tokens or in so...
research
06/02/2023

Unsupervised Paraphrasing of Multiword Expressions

We propose an unsupervised approach to paraphrasing multiword expression...
research
11/13/2019

Word-level Lexical Normalisation using Context-Dependent Embeddings

Lexical normalisation (LN) is the process of correcting each word in a d...
research
12/22/2020

Improved Biomedical Word Embeddings in the Transformer Era

Biomedical word embeddings are usually pre-trained on free text corpora ...

Please sign up or login with your details

Forgot password? Click here to reset