Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

09/13/2022
by   Dongsuk Oh, et al.
0

Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.

READ FULL TEXT

page 1

page 7

research
11/02/2020

On the Sentence Embeddings from Pre-trained Language Models

Pre-trained contextual representations like BERT have achieved great suc...
research
04/18/2021

SimCSE: Simple Contrastive Learning of Sentence Embeddings

This paper presents SimCSE, a simple contrastive learning framework that...
research
10/20/2022

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

Out-of-distribution (OOD) detection aims to discern outliers from the in...
research
12/31/2020

CLEAR: Contrastive Learning for Sentence Representation

Pre-trained language models have proven their unique powers in capturing...
research
01/28/2022

PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings

Learning sentence embeddings in an unsupervised manner is fundamental in...
research
04/25/2023

Compressing Sentence Representation with maximum Coding Rate Reduction

In most natural language inference problems, sentence representation is ...
research
09/30/2021

Focused Contrastive Training for Test-based Constituency Analysis

We propose a scheme for self-training of grammaticality models for const...

Please sign up or login with your details

Forgot password? Click here to reset