Automatic Weighted Matching Rectifying Rule Discovery for Data Repairing

09/21/2019
by   Hiba Abu Ahmad, et al.
0

Data repairing is a key problem in data cleaning which aims to uncover and rectify data errors. Traditional methods depend on data dependencies to check the existence of errors in data, but they fail to rectify the errors. To overcome this limitation, recent methods define repairing rules on which they depend to detect and fix errors. However, all existing data repairing rules are provided by experts which is an expensive task in time and effort. Besides, rule-based data repairing methods need an external verified data source or user verifications; otherwise they are incomplete where they can repair only a small number of errors. In this paper, we define weighted matching rectifying rules (WMRRs) based on similarity matching to capture more errors. We propose a novel algorithm to discover WMRRs automatically from dirty data in-hand. We also develop an automatic algorithm for rules inconsistency resolution. Additionally, based on WMRRs, we propose an automatic data repairing algorithm (WMRR-DR) which uncovers a large number of errors and rectifies them dependably. We experimentally verify our method on both real-life and synthetic data. The experimental results prove that our method can discover effective WMRRs from the dirty data in-hand, and perform dependable and full-automatic repairing based on the discovered WMRRs, with higher accuracy than the existing dependable methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2018

Mining CFD Rules on Big Data

Current conditional functional dependencies (CFDs) discovery algorithms ...
research
10/24/2022

Path association rule mining

Graph association rule mining is a data mining technique used for discov...
research
07/18/2022

PBRE: A Rule Extraction Method from Trained Neural Networks Designed for Smart Home Services

Designing smart home services is a complex task when multiple services w...
research
03/16/2021

Coordinate Constructions in English Enhanced Universal Dependencies: Analysis and Computational Modeling

In this paper, we address the representation of coordinate constructions...
research
07/25/2017

Evidence combination for a large number of sources

The theory of belief functions is an effective tool to deal with the mul...
research
06/05/2018

GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings

This article presents GuideR, a user-guided rule induction algorithm, wh...
research
11/07/2018

DragonPaint: Rule based bootstrapping for small data with an application to cartoon coloring

In this paper, we confront the problem of deep learning's big labeled da...

Please sign up or login with your details

Forgot password? Click here to reset