Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

09/27/2021
by   Fangyu Liu, et al.
1

In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders. Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient, however, they usually underperform cross-encoders. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance but they require task fine-tuning and are computationally more expensive. In this paper, we present a completely unsupervised sentence representation model termed as Trans-Encoder that combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders. Specifically, on top of a pre-trained Language Model (PLM), we start with converting it to an unsupervised bi-encoder, and then alternate between the bi- and cross-encoder task formulations. In each alternation, one task formulation will produce pseudo-labels which are used as learning signals for the other task formulation. We then propose an extension to conduct such self-distillation approach on multiple PLMs in parallel and use the average of their pseudo-labels for mutual-distillation. Trans-Encoder creates, to the best of our knowledge, the first completely unsupervised cross-encoder and also a state-of-the-art unsupervised bi-encoder for sentence similarity. Both the bi-encoder and cross-encoder formulations of Trans-Encoder outperform recently proposed state-of-the-art unsupervised sentence encoders such as Mirror-BERT and SimCSE by up to 5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2020

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

There are two approaches for pairwise sentence scoring: Cross-encoders, ...
research
03/11/2021

Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation

BERT-based Neural Ranking Models (NRMs) can be classified according to h...
research
04/22/2019

Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers

The use of deep pretrained bidirectional transformers has led to remarka...
research
10/28/2021

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

A BERT-based Neural Ranking Model (NRM) can be either a cross-encoder or...
research
04/23/2020

Distilling Knowledge for Fast Retrieval-based Chat-bots

Response retrieval is a subset of neural ranking in which a model select...
research
10/01/2021

Building an Efficient and Effective Retrieval-based Dialogue System via Mutual Learning

Establishing retrieval-based dialogue systems that can select appropriat...
research
11/02/2022

Cross-stitching Text and Knowledge Graph Encoders for Distantly Supervised Relation Extraction

Bi-encoder architectures for distantly-supervised relation extraction ar...

Please sign up or login with your details

Forgot password? Click here to reset