Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder

02/18/2021
by   Shuqi Lu, et al.
0

Many real-world applications use Siamese networks to efficiently match text sequences at scale, which require high-quality sequence encodings. This paper pre-trains language models dedicated to sequence matching in Siamese architectures. We first hypothesize that a representation is better for sequence matching if the entire sequence can be reconstructed from it, which, however, is unlikely to be achieved in standard autoencoders: A strong decoder can rely on its capacity and natural language patterns to reconstruct and bypass the needs of better sequence encodings. Therefore we propose a new self-learning method that pretrains the encoder with a weak decoder, which reconstructs the original sequence from the encoder's [CLS] representations but is restricted in both capacity and attention span. In our experiments on web search and recommendation, the pre-trained SEED-Encoder, "SiamEsE oriented encoder by reconstructing from weak decoder", shows significantly better generalization ability when fine-tuned in Siamese networks, improving overall accuracy and few-shot performances. Our code and models will be released.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each ...
research
04/22/2022

Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction

Dense retrieval has shown promising results in many information retrieva...
research
08/22/2019

Denoising based Sequence-to-Sequence Pre-training for Text Generation

This paper presents a new sequence-to-sequence (seq2seq) pre-training me...
research
08/21/2022

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

Dense retrieval (DR) has shown promising results in information retrieva...
research
07/05/2023

Improving Address Matching using Siamese Transformer Networks

Matching addresses is a critical task for companies and post offices inv...
research
05/23/2023

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Neural document rerankers are extremely effective in terms of accuracy. ...
research
12/26/2022

Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models

Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of ...

Please sign up or login with your details

Forgot password? Click here to reset