Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

08/27/2019
by   Nils Reimers, et al.
0

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations ( 65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2021

Dual-View Distilled BERT for Sentence Embedding

Recently, BERT realized significant progress for sentence matching via w...
research
09/25/2020

An Unsupervised Sentence Embedding Method byMutual Information Maximization

BERT is inefficient for sentence-pair tasks such as clustering or semant...
research
01/26/2021

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Contextualized representations from a pre-trained language model are cen...
research
10/05/2021

Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings

Semantic sentence embeddings are usually supervisedly built minimizing d...
research
09/03/2019

Transfer Fine-Tuning: A BERT Case Study

A semantic equivalence assessment is defined as a task that assesses sem...
research
04/17/2021

ASBERT: Siamese and Triplet network embedding for open question answering

Answer selection (AS) is an essential subtask in the field of natural la...
research
09/05/2023

Using a Nearest-Neighbour, BERT-Based Approach for Scalable Clone Detection

Code clones can detrimentally impact software maintenance and manually d...

Please sign up or login with your details

Forgot password? Click here to reset