Synthesizing Linked Data Under Cardinality and Integrity Constraints

03/26/2021
by   Amir Gilad, et al.
0

The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating the links between relations, subject to cardinality constraints (CCs) and integrity constraints (ICs) is an important aspect of this problem. Given instances of two relations, where one has a foreign key dependence on the other and is missing its foreign key (FK) values, and two types of constraints: (1) CCs that apply to the join view and (2) ICs that apply to the table with missing FK values, our goal is to impute the missing FK values such that the constraints are satisfied. We provide a novel framework for the problem based on declarative CCs and ICs. We further show that the problem is NP-hard and propose a novel two-phase solution that guarantees the satisfaction of the ICs. Phase I yields an intermediate solution accounting for the CCs alone, and relies on a hybrid approach based on CC types. For one type, the problem is modeled as an Integer Linear Program. For the others, we describe an efficient and accurate solution. We then combine the two solutions. Phase II augments this solution by incorporating the ICs and uses a coloring of the conflict hypergraph to infer the values of the FK column. Our extensive experimental study shows that our solution scales well when the data and number of constraints increases. We further show that our solution maintains low error rates for the CCs.

READ FULL TEXT
research
03/21/2017

On The Projection Operator to A Three-view Cardinality Constrained Set

The cardinality constraint is an intrinsic way to restrict the solution ...
research
05/22/2019

A Hypergraph Based Approach for the 4-Constraint Satisfaction Problem Tractability

Constraint Satisfaction Problem (CSP) is a framework for modeling and so...
research
12/23/2022

The Consistency of Probabilistic Databases with Independent Cells

A probabilistic database with attribute-level uncertainty consists of re...
research
10/05/2018

Improved Inapproximability of Rainbow Coloring

A rainbow q-coloring of a k-uniform hypergraph is a q-coloring of the ve...
research
06/15/2018

Efficient Handling of SPARQL OPTIONAL for OBDA (Extended Version)

OPTIONAL is a key feature in SPARQL for dealing with missing information...
research
04/08/2020

Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints

Today, data analysts largely rely on intuition to determine whether miss...
research
08/28/2018

Cost-efficient Data Acquisition on Online Data Marketplaces for Correlation Analysis

Incentivized by the enormous economic profits, the data marketplace plat...

Please sign up or login with your details

Forgot password? Click here to reset