Balancing Gaussian vectors in high dimension
Motivated by problems in controlled experiments, we study the discrepancy of random matrices with continuous entries where the number of columns n is much larger than the number of rows m. Our first result shows that if ω(1) ≤ m ≤ o(n), a matrix with i.i.d. standard Gaussian entries has discrepancy Θ(√(n) 2^-n/m) with high probability. This provides sharp guarantees for Gaussian discrepancy in a regime that had not been considered before in the existing literature. Our results also apply to a more general family of random matrices with continuous i.i.d. entries, assuming that m ≤ O(n/logn). The proof is non-constructive and is an application of the second moment method. Our second result is algorithmic and applies to random matrices whose entries are i.i.d. and have a Lipschitz density. We present a randomized polynomial-time algorithm that achieves discrepancy e^-Ω(log^2(n)/m) with high probability, provided that m ≤ O(√(logn)). In the one-dimensional case, this matches the best known algorithmic guarantees due to Karmarkar–Karp. For higher dimensions 2 ≤ m ≤ O(√(logn)), this establishes the first efficient algorithm achieving discrepancy smaller than O( √(m) ).
READ FULL TEXT