Whitening Sentence Representations for Better Semantics and Faster Retrieval

03/29/2021
by   Jianlin Su, et al.
0

Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

On the Sentence Embeddings from Pre-trained Language Models

Pre-trained contextual representations like BERT have achieved great suc...
research
05/13/2022

Improving Contextual Representation with Gloss Regularized Pre-training

Though achieving impressive results on many NLP tasks, the BERT-like mas...
research
10/13/2020

CAPT: Contrastive Pre-Training for LearningDenoised Sequence Representations

Pre-trained self-supervised models such as BERT have achieved striking s...
research
01/09/2021

Learning Better Sentence Representation with Syntax Information

Sentence semantic understanding is a key topic in the field of natural l...
research
04/25/2023

Compressing Sentence Representation with maximum Coding Rate Reduction

In most natural language inference problems, sentence representation is ...
research
08/04/2020

Taking Notes on the Fly Helps BERT Pre-training

How to make unsupervised language pre-training more efficient and less r...
research
06/02/2023

A Simple yet Effective Self-Debiasing Framework for Transformer Models

Current Transformer-based natural language understanding (NLU) models he...

Please sign up or login with your details

Forgot password? Click here to reset