Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

03/15/2022
by   Xuandong Zhao, et al.
0

How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2×) and memory usage (8.0×) compared with state-of-the-art large models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2023

Compressing Sentence Representation with maximum Coding Rate Reduction

In most natural language inference problems, sentence representation is ...
research
07/09/2019

Multilingual Universal Sentence Encoder for Semantic Retrieval

We introduce two pre-trained retrieval focused multilingual sentence enc...
research
05/04/2023

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support information retrieval tasks such as web search and ope...
research
04/29/2020

Revisiting Round-Trip Translation for Quality Estimation

Quality estimation (QE) is the task of automatically evaluating the qual...
research
08/15/2019

Hamming Sentence Embeddings for Information Retrieval

In retrieval applications, binary hashes are known to offer significant ...
research
09/27/2022

Regularized Contrastive Learning of Semantic Search

Semantic search is an important task which objective is to find the rele...
research
06/19/2019

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, ar...

Please sign up or login with your details

Forgot password? Click here to reset