Phase Transitions in the Detection of Correlated Databases

02/07/2023
āˆ™
by   Dor Elimelech, et al.
āˆ™
0
āˆ™

We study the problem of detecting the correlation between two Gaussian databases š–·āˆˆā„^nƗ d and š–ø^nƗ d, each composed of n users with d features. This problem is relevant in the analysis of social media, computational biology, etc. We formulate this as a hypothesis testing problem: under the null hypothesis, these two databases are statistically independent. Under the alternative, however, there exists an unknown permutation Ļƒ over the set of n users (or, row permutation), such that š–· is Ļ-correlated with š–ø^Ļƒ, a permuted version of š–ø. We determine sharp thresholds at which optimal testing exhibits a phase transition, depending on the asymptotic regime of n and d. Specifically, we prove that if Ļ^2dā†’0, as dā†’āˆž, then weak detection (performing slightly better than random guessing) is statistically impossible, irrespectively of the value of n. This compliments the performance of a simple test that thresholds the sum all entries of š–·^Tš–ø. Furthermore, when d is fixed, we prove that strong detection (vanishing error probability) is impossible for any Ļ<Ļ^ā‹†, where Ļ^ā‹† is an explicit function of d, while weak detection is again impossible as long as Ļ^2dā†’0. These results close significant gaps in current recent related studies.

READ FULL TEXT
research
āˆ™ 06/23/2022

Detecting Correlated Gaussian Databases

This paper considers the problem of detecting whether two databases, eac...
research
āˆ™ 11/02/2022

Joint Correlation Detection and Alignment of Gaussian Databases

In this work, we propose an efficient two-stage algorithm solving a join...
research
āˆ™ 08/23/2020

Testing correlation of unlabeled random graphs

We study the problem of detecting the edge correlation between two rando...
research
āˆ™ 05/24/2023

Weak Signal Detection via Displacement Interpolation

Detecting weak, systematic signals hidden in a large collection of p-val...
research
āˆ™ 03/12/2019

The All-or-Nothing Phenomenon in Sparse Linear Regression

We study the problem of recovering a hidden binary k-sparse p-dimensiona...
research
āˆ™ 03/28/2022

Detection threshold for correlated Erdős-RĆ©nyi graphs via densest subgraphs

The problem of detecting edge correlation between two Erdős-RĆ©nyi random...
research
āˆ™ 05/09/2019

Limits of Deepfake Detection: A Robust Estimation Viewpoint

Deepfake detection is formulated as a hypothesis testing problem to clas...

Please sign up or login with your details

Forgot password? Click here to reset