Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

07/20/2023
by   Michael Günther, et al.
0

Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating various textual inputs into numerical representations, thereby capturing the semantic essence of the text. The models excel in applications such as dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets. It underlines the crucial role of data cleaning in dataset preparation, gives in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Textual Embedding Benchmark (MTEB). To increase the model's awareness of negations, we constructed a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2019

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

We address the task of unsupervised Semantic Textual Similarity (STS) by...
research
10/13/2022

MTEB: Massive Text Embedding Benchmark

Text embeddings are commonly evaluated on a small set of datasets from a...
research
08/24/2023

Sentence Embedding Models for Ancient Greek Using Multilingual Knowledge Distillation

Contextual language models have been trained on Classical languages, inc...
research
04/20/2018

Learning Semantic Textual Similarity from Conversations

We present a novel approach to learn representations for sentence-level ...
research
04/20/2018

Sequential Network Transfer: Adapting Sentence Embeddings to Human Activities and Beyond

We study the problem of adapting neural sentence embedding models to the...
research
02/22/2020

Efficient Sentence Embedding via Semantic Subspace Analysis

A novel sentence embedding method built upon semantic subspace analysis,...
research
11/22/2020

Enriching ImageNet with Human Similarity Judgments and Psychological Embeddings

Advances in object recognition flourished in part because of the availab...

Please sign up or login with your details

Forgot password? Click here to reset