ℓ_p-Regression in the Arbitrary Partition Model of Communication
We consider the randomized communication complexity of the distributed ℓ_p-regression problem in the coordinator model, for p∈ (0,2]. In this problem, there is a coordinator and s servers. The i-th server receives A^i∈{-M, -M+1, …, M}^n× d and b^i∈{-M, -M+1, …, M}^n and the coordinator would like to find a (1+ϵ)-approximate solution to min_x∈ℝ^n(∑_i A^i)x - (∑_i b^i)_p. Here M ≤poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For p = 2, i.e., least squares regression, we give the first optimal bound of Θ̃(sd^2 + sd/ϵ) bits. For p ∈ (1,2),we obtain an Õ(sd^2/ϵ + sd/poly(ϵ)) upper bound. Notably, for d sufficiently large, our leading order term only depends linearly on 1/ϵ rather than quadratically. We also show communication lower bounds of Ω(sd^2 + sd/ϵ^2) for p∈ (0,1] and Ω(sd^2 + sd/ϵ) for p∈ (1,2]. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).
READ FULL TEXT