MTEB: Massive Text Embedding Benchmark

10/13/2022
by   Niklas Muennighoff, et al.
0

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 56 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks. MTEB comes with open-source code and a public leaderboard at https://huggingface.co/spaces/mteb/leaderboard.

READ FULL TEXT

page 3

page 4

page 19

research
09/14/2023

C-Pack: Packaged Resources To Advance General Chinese Embedding

We introduce C-Pack, a package of resources that significantly advance t...
research
07/20/2023

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

Jina Embeddings constitutes a set of high-performance sentence embedding...
research
12/17/2022

Relational Sentence Embedding for Flexible Semantic Matching

We present Relational Sentence Embedding (RSE), a new paradigm to furthe...
research
06/22/2023

Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity

Vector embeddings have become ubiquitous tools for many language-related...
research
01/24/2022

Text and Code Embeddings by Contrastive Pre-Training

Text embeddings are useful features in many applications such as semanti...
research
11/04/2019

Spherical Text Embedding

Unsupervised text embedding has shown great power in a wide range of NLP...
research
11/30/2022

Generalised Spherical Text Embedding

This paper aims to provide an unsupervised modelling approach that allow...

Please sign up or login with your details

Forgot password? Click here to reset