On the Sentence Embeddings from Pre-trained Language Models

11/02/2020
by   Bohan Li, et al.
0

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

READ FULL TEXT
research
05/18/2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings

Prior studies diagnose the anisotropy problem in sentence representation...
research
10/23/2020

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

Large pre-trained language models such as BERT have been the driving for...
research
03/29/2021

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Pre-training models such as BERT have achieved great success in many nat...
research
09/13/2022

Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

Recent pre-trained language models (PLMs) achieved great success on many...
research
10/21/2020

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures

Measuring sentence semantic similarity using pre-trained language models...
research
04/13/2022

HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection

The same multi-word expressions may have different meanings in different...
research
12/08/2022

Explain to me like I am five – Sentence Simplification Using Transformers

Sentence simplification aims at making the structure of text easier to r...

Please sign up or login with your details

Forgot password? Click here to reset