Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics

01/17/2018
by   Shrainik Jain, et al.
0

We consider methods for learning vector representations of SQL queries to support generalized workload analytics tasks, including workload summarization for index selection and predicting queries that will trigger memory errors. We consider vector representations of both raw SQL text and optimized query plans, and evaluate these methods on synthetic and real SQL workloads. We find that general algorithms based on vector representations can outperform existing approaches that rely on specialized features. For index recommendation, we cluster the vector representations to compress large workloads with no loss in performance from the recommended index. For error prediction, we train a classifier over learned vectors that can automatically relate subtle syntactic patterns with specific errors raised during query execution. Surprisingly, we also find that these methods enable transfer learning, where a model trained on one SQL corpus can be applied to an unrelated corpus and still enable good performance. We find that these general approaches, when trained on a large corpus of SQL queries, provides a robust foundation for a variety of workload analysis tasks and database features, without requiring application-specific feature engineering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2018

Query2Vec: NLP Meets Databases for Generalized Workload Analytics

We propose methods for learning vector representations of SQL workloads ...
research
08/25/2018

Database-Agnostic Workload Management

We present a system to support generalized SQL workload analysis and man...
research
02/21/2020

Facilitating SQL Query Composition and Analysis

Formulating efficient SQL queries requires several cycles of tuning and ...
research
08/09/2021

"What makes my queries slow?": Subgroup Discovery for SQL Workload Analysis

Among daily tasks of database administrators (DBAs), the analysis of que...
research
07/12/2019

Detecting coherent explorations in SQL workloads

This paper presents a proposal aiming at better understanding a workload...
research
06/01/2023

BitE : Accelerating Learned Query Optimization in a Mixed-Workload Environment

Although the many efforts to apply deep reinforcement learning to query ...
research
05/26/2021

Database Workload Characterization with Query Plan Encoders

Smart databases are adopting artificial intelligence (AI) technologies t...

Please sign up or login with your details

Forgot password? Click here to reset