QueryER: A Framework for Fast Analysis-Aware Deduplication over Dirty Data

02/03/2022
by   Giorgos Alexiou, et al.
0

In this work, we explore the problem of correctly and efficiently answering complex SPJ queries issued directly on top of dirty data. We introduce QueryER, a framework that seamlessly integrates Entity Resolution into Query Processing. QueryER executes analysis-aware deduplication by weaving ER operators into the query plan. The experimental evaluation of our approach exhibits that it adapts to the workload and scales on both real and synthetic datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2020

Cleaning Denial Constraint Violations through Relaxation

Data cleaning is a time-consuming process which depends on the data anal...
research
10/31/2011

Query-time Entity Resolution

Entity resolution is the problem of reconciling database references corr...
research
03/16/2018

Distributed Caching for Complex Querying of Raw Arrays

As applications continue to generate multi-dimensional data at exponenti...
research
05/12/2022

Query Complexity Based Optimal Processing of Raw Data

The paper aims to find an efficient way for processing large datasets ha...
research
04/28/2021

Fast Parallel Hypertree Decompositions in Logarithmic Recursion Depth

Modern trends in data collection are bringing current mainstream techniq...
research
09/28/2020

Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version)

Incremental processing is widely-adopted in many applications, ranging f...
research
03/22/2021

Efficient Processing of k-regret Minimization Queries with Theoretical Guarantees

Assisting end users to identify desired results from a large dataset is ...

Please sign up or login with your details

Forgot password? Click here to reset