Detecting Correlated Gaussian Databases
This paper considers the problem of detecting whether two databases, each consisting of n users with d Gaussian features, are correlated. Under the null hypothesis, the databases are independent. Under the alternate hypothesis, the features are correlated across databases, under an unknown row permutation. A simple test is developed to show that detection is achievable above ρ^2 ≈1/d. For the converse, the truncated second moment method is used to establish that detection is impossible below roughly ρ^2 ≈1/d√(n). These results are compared to the corresponding recovery problem, where the goal is to decode the row permutation, and a converse bound of roughly ρ^2 ≈ 1 - n^-4/d has been previously shown. For certain choices of parameters, the detection achievability bound outperforms this recovery converse bound, demonstrating that detection can be easier than recovery in this scenario.