KOIOS: Top-k Semantic Overlap Set Search

04/20/2023
by   Pranay Mundra, et al.
0

We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose KOIOS, the first exact and efficient algorithm for semantic overlap search. KOIOS leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5 verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2018

Convolutional Set Matching for Graph Similarity

We introduce GSimCNN (Graph Similarity Computation via Convolutional Neu...
research
12/18/2019

The Planted Matching Problem: Phase Transitions and Exact Results

We study the problem of recovering a planted matching in randomly weight...
research
02/13/2018

Hierarchical Overlap Graph

Given a set of finite words, the Overlap Graph (OG) is a complete weight...
research
07/22/2021

LES3: Learning-based Exact Set Similarity Search

Set similarity search is a problem of central interest to a wide variety...
research
12/21/2018

Speeding-up the Verification Phase of Set Similarity Joins in the GPGPU paradigm

We investigate the problem of exact set similarity joins using a co-proc...
research
11/20/2017

Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations

The Exact Set Similarity Join problem aims to find all similar sets betw...
research
08/13/2020

Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings

To what extent are two images picturing the same 3D surfaces? Even when ...

Please sign up or login with your details

Forgot password? Click here to reset