Retrieving Texts based on Abstract Descriptions

05/21/2023
by   Shauli Ravfogel, et al.
3

In this work, we aim to connect two research areas: instruction models and retrieval-based models. While instruction-tuned Large Language Models (LLMs) excel at extracting information from text, they are not suitable for semantic retrieval. Similarity search over embedding vectors allows to index and query vectors, but the similarity reflected in the embedding is sub-optimal for many use cases. We identify the task of retrieving sentences based on abstract descriptions of their content. We demonstrate the inadequacy of current text embeddings and propose an alternative model that significantly improves when used in standard nearest neighbor search. The model is trained using positive and negative pairs sourced through prompting an a large language model (LLM). While it is easy to source the training material from an LLM, the retrieval task cannot be performed by the LLM directly. This demonstrates that data from LLMs can be used not only for distilling more efficient specialized models than the original LLM, but also for creating new capabilities not immediately possible using the original model.

READ FULL TEXT

page 10

page 12

research
05/29/2023

Test-Time Training on Nearest Neighbors for Large Language Models

Many recent efforts aim to augment language models with relevant informa...
research
03/27/2023

Unified Text Structuralization with Instruction-tuned Language Models

Text structuralization is one of the important fields of natural languag...
research
03/10/2023

Semantic-Preserving Augmentation for Robust Image-Text Retrieval

Image text retrieval is a task to search for the proper textual descript...
research
01/07/2023

Why do Nearest Neighbor Language Models Work?

Language models (LMs) compute the probability of a text by sequentially ...
research
11/19/2018

End-to-End Retrieval in Continuous Space

Most text-based information retrieval (IR) systems index objects by word...
research
10/05/2022

Contextualized Generative Retrieval

The text retrieval task is mainly performed in two ways: the bi-encoder ...
research
05/10/2023

Evaluating Embedding APIs for Information Retrieval

The ever-increasing size of language models curtails their widespread ac...

Please sign up or login with your details

Forgot password? Click here to reset