Using Positional Sequence Patterns to Estimate the Selectivity of SQL LIKE Queries

02/04/2020
by   Mehmet Aytimur, et al.
0

With the dramatic increase in the amount of the text-based data which commonly contains misspellings and other errors, querying such data with flexible search patterns becomes more and more commonplace. Relational databases support the LIKE operator to allow searching with a particular wildcard predicate (e.g., LIKE 'Sub 'Sub'). Due to the large size of text data, executing such queries in the most optimal way is quite critical for database performance. While building the most efficient execution plan for a LIKE query, the query optimizer requires the selectivity estimate for the flexible pattern-based query predicate. Recently, SPH algorithm is proposed which employs a sequence pattern-based histogram structure to estimate the selectivity of LIKE queries. A drawback of the SPH approach is that it often overestimates the selectivity of queries. In order to alleviate the overestimation problem, in this paper, we propose a novel sequence pattern type, called positional sequence patterns. The proposed patterns differentiate between sequence item pairs that appear next to each other in all pattern occurrences from those that may have other items between them. Besides, we employ redundant pattern elimination based on pattern information content during histogram construction. Finally, we propose a partitioning-based matching scheme during the selectivity estimation. The experimental results on a real dataset from DBLP show that the proposed approach outperforms the state of the art by around 20 rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2019

Improved Cardinality Estimation by Learning Queries Containment Rates

The containment rate of query Q1 in query Q2 over database D is the perc...
research
09/14/2018

SQL-to-Text Generation with Graph-to-Sequence Model

Previous work approaches the SQL-to-text generation task using vanilla S...
research
10/22/2018

Fast Dual Simulation Processing of Graph Database Queries (Supplement)

Graph database query languages feature expressive, yet computationally e...
research
03/24/2023

Efficient Execution of SPARQL Queries with OPTIONAL and UNION Expressions

The proliferation of RDF datasets has resulted in studies focusing on op...
research
11/13/2017

SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning

Synthesizing SQL queries from natural language is a long-standing open p...
research
04/25/2018

TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation

Interacting with relational databases through natural language helps use...
research
11/19/2018

ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines

Identifying trendline visualizations with desired patterns is a common a...

Please sign up or login with your details

Forgot password? Click here to reset