Database Matching Under Column Deletions

05/20/2021
by   Serhat Bakirtas, et al.
0

De-anonymizing user identities by matching various forms of user data available on the internet raises privacy concerns. A fundamental understanding of the privacy leakage in such scenarios requires a careful study of conditions under which correlated databases can be matched. Motivated by synchronization errors in time indexed databases, in this work, matching of random databases under random column deletion is investigated. Adapting tools from information theory, in particular ones developed for the deletion channel, conditions for database matching in the absence and presence of deletion location information are derived, showing that partial deletion information significantly increases the achievable database growth rate for successful matching. Furthermore, given a batch of correctly-matched rows, a deletion detection algorithm that provides partial deletion information is proposed and a lower bound on the algorithm's deletion detection probability in terms of the column size and the batch size is derived. The relationship between the database size and the batch size required to guarantee a given deletion detection probability using the proposed algorithm suggests that a batch size growing double-logarithmic with the row size is sufficient for a nonzero detection probability guarantee.

READ FULL TEXT
research
02/03/2022

Seeded Database Matching Under Noisy Column Repetitions

The re-identification or de-anonymization of users from anonymized data ...
research
12/14/2022

Database Matching Under Adversarial Column Deletions

The de-anonymization of users from anonymized microdata through matching...
research
01/17/2023

Database Matching Under Noisy Synchronization Errors

The re-identification or de-anonymization of users from anonymized data ...
research
02/03/2022

Database Matching Under Column Repetitions

Motivated by synchronization errors in the sampling of time-indexed data...
research
05/11/2023

Improved Upper and Lower Bounds on the Capacity of the Binary Deletion Channel

The binary deletion channel with deletion probability d (BDC_d) is a ran...
research
10/14/2022

Upper bounds on the Rate of Uniformly-Random Codes for the Deletion Channel

We consider the maximum coding rate achievable by uniformly-random codes...

Please sign up or login with your details

Forgot password? Click here to reset