Validation of Matching

10/31/2014
by   Ya Le, et al.
0

We introduce a technique to compute probably approximately correct (PAC) bounds on precision and recall for matching algorithms. The bounds require some verified matches, but those matches may be used to develop the algorithms. The bounds can be applied to network reconciliation or entity resolution algorithms, which identify nodes in different networks or values in a data set that correspond to the same entity. For network reconciliation, the bounds do not require knowledge of the network generation process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

A General Framework for the Derandomization of PAC-Bayesian Bounds

PAC-Bayesian bounds are known to be tight and informative when studying ...
research
09/10/2015

Performance Bounds for Pairwise Entity Resolution

One significant challenge to scaling entity resolution algorithms to mas...
research
03/17/2021

DomainNet: Homograph Detection for Data Lake Disambiguation

Modern data lakes are deeply heterogeneous in the vocabulary that is use...
research
05/21/2022

On the problem of entity matching and its application in automated settlement of receivables

This paper covers automated settlement of receivables in non-governmenta...
research
08/14/2022

Sharp Frequency Bounds for Sample-Based Queries

A data sketch algorithm scans a big data set, collecting a small amount ...
research
01/15/2014

Transductive Rademacher Complexity and its Applications

We develop a technique for deriving data-dependent error bounds for tran...
research
07/10/2023

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

The SHAP framework provides a principled method to explain the predictio...

Please sign up or login with your details

Forgot password? Click here to reset