Facilitating SQL Query Composition and Analysis
Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to execution. We achieve this by predicting properties such as the query answer size, its run-time, and error class. Unlike existing approaches, our approach does not rely on any statistics from the database instance or query execution plans. This is particularly important in settings with limited access to the database instance. Our approach is based on using data-driven machine learning techniques that rely on large query workloads to model SQL queries and their properties. We evaluate the utility of neural network models and traditional machine learning models. We use two real-world query workloads: the Sloan Digital Sky Survey (SDSS) and the SQLShare query workload. Empirical results show that the neural network models are more accurate in predicting the query error class, achieving a higher F-measure on classes with fewer samples as well as performing better on other problems such as run-time and answer size prediction. These results are encouraging and confirm that SQL query workloads and data-driven machine learning methods can be leveraged to facilitate query composition and analysis.
READ FULL TEXT