High-Throughput Vector Similarity Search in Knowledge Graphs

04/04/2023
by   Jason Mohoney, et al.
0

There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors. For example, given past KG queries for a song entity, we want to construct new queries for new song entities whose vector representations are close to the vector representation of the entity in the past KG query. But entities in a KG also have non-vector attributes such as a song associated with an artist, a genre, and a release date. Therefore, suggested entities must also satisfy query predicates over non-vector attributes beyond a vector-based similarity predicate. While these tasks are central to KGs, our contributions are generally applicable to hybrid queries. In contrast to prior works that optimize online queries, we focus on enabling efficient batch processing of past hybrid query workloads. We present our system, HQI, for high-throughput batch processing of hybrid queries. We introduce a workload-aware vector data partitioning scheme to tailor the vector index layout to the given workload and describe a multi-query optimization technique to reduce the overhead of vector similarity computations. We evaluate our methods on industrial workloads and demonstrate that HQI yields a 31x improvement in throughput for finding related KG queries compared to existing hybrid query processing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces

Similarity search queries in high-dimensional spaces are an important ty...
research
05/11/2019

Mining Hidden Populations through Attributed Search

Researchers often query online social platforms through their applicatio...
research
03/25/2022

Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints

As research interest surges, vector similarity search is applied in mult...
research
07/16/2022

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

The in-memory approximate nearest neighbor search (ANNS) algorithms have...
research
03/23/2021

HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries (Extended Version)

Hybrid complex analytics workloads typically include (i) data management...
research
07/14/2022

Using Fuzzy Matching of Queries to optimize Database workloads

Directed Acyclic Graphs (DAGs) are commonly used in Databases and Big Da...
research
01/11/2022

ATRAPOS: Evaluating Metapath Query Workloads in Real Time

Heterogeneous information networks (HINs) represent different types of e...

Please sign up or login with your details

Forgot password? Click here to reset