SGPT: GPT Sentence Embeddings for Semantic Search

02/17/2022
by   Niklas Muennighoff, et al.
0

GPT transformers are the largest language models available, yet semantic search is dominated by BERT transformers. We present SGPT-BE and SGPT-CE for applying GPT models as Bi-Encoders or Cross-Encoders to symmetric or asymmetric search. SGPT-BE produces semantically meaningful sentence embeddings by contrastive fine-tuning of only bias tensors and a novel pooling method. A 5.8 billion parameter SGPT-BE outperforms the best available sentence embeddings by 6 setting a new state-of-the-art on BEIR. It outperforms the concurrently proposed OpenAI Embeddings of the 175B Davinci endpoint, which fine-tunes 250,000 times more parameters. SGPT-CE uses log probabilities from GPT models without any fine-tuning. A 6.1 billion parameter SGPT-CE sets an unsupervised state-of-the-art on BEIR. It beats the supervised state-of-the-art on 7 datasets, but significantly loses on other datasets. We show how this can be alleviated by adapting the prompt. SGPT-BE and SGPT-CE performance scales with model size. Yet, increased latency, storage and compute costs should be considered. Code, models and result files are freely available at https://github.com/Muennighoff/sgpt.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset