Efficient Approximate Search for Sets of Vectors

07/14/2021
by   Michael Leybovich, et al.
0

We consider a similarity measure between two sets A and B of vectors, that balances the average and maximum cosine distance between pairs of vectors, one from set A and one from set B. As a motivation for this measure, we present lineage tracking in a database. To practically realize this measure, we need an approximate search algorithm that given a set of vectors A and sets of vectors B_1,...,B_n, the algorithm quickly locates the set B_i that maximizes the similarity measure. For the case where all sets are singleton sets, essentially each is a single vector, there are known efficient approximate search algorithms, e.g., approximated versions of tree search algorithms, locality-sensitive hashing (LSH), vector quantization (VQ) and proximity graph algorithms. In this work, we present approximate search algorithms for the general case. The underlying idea in these algorithms is encoding a set of vectors via a "long" single vector. The proposed approximate approach achieves significant performance gains over an optimized, exact search on vector sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries

We study the problem of vector set search with vector set queries. This ...
research
03/06/2020

LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew

All-pairs set similarity is a widely used data mining task, even for lar...
research
10/31/2022

Using Locality-sensitive Hashing for Rendezvous Search

The multichannel rendezvous problem is a fundamental problem for neighbo...
research
08/25/2019

A Method for Estimating the Proximity of Vector Representation Groups in Multidimensional Space. On the Example of the Paraphrase Task

The following paper presents a method of comparing two sets of vectors. ...
research
04/09/2018

Set Similarity Search for Skewed Data

Set similarity join, as well as the corresponding indexing problem set s...
research
10/04/2017

GraphMatch: Efficient Large-Scale Graph Construction for Structure from Motion

We present GraphMatch, an approximate yet efficient method for building ...
research
12/18/2018

Index-based, High-dimensional, Cosine Threshold Querying with Optimality Guarantees

Given a database of vectors, a cosine threshold query returns all vector...

Please sign up or login with your details

Forgot password? Click here to reset