Distributed Sparse Feature Selection in Communication-Restricted Networks

11/02/2021
by   Hanie Barghi, et al.
0

This paper aims to propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. The primary goal is to learn the few causal features of a high-dimensional dataset based on noisy observations from an unknown sparse linear model. However, the presumed training set which includes n data samples in ℝ^p is already distributed over a large network with N clients connected through extremely low-bandwidth links. Also, we consider the asymptotic configuration of 1≪ N≪ n≪ p. In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network. In this regard, we theoretically show that the true causal features can be reliably recovered with negligible bandwidth usage of O(Nlog p) across the network. This yields a significantly lower communication cost in comparison with the trivial case of transmitting all the samples to a single node (centralized scenario), which requires O(np) transmissions. Even more sophisticated schemes such as ADMM still have a communication complexity of O(Np). Surprisingly, our sample complexity bound is proved to be the same (up to a constant factor) as the optimal centralized approach for a fixed performance measure in each node, while that of a naïve decentralized technique grows linearly with N. Theoretical guarantees in this paper are based on the recent analytic framework of debiased LASSO in Javanmard et al. (2019), and are supported by several computer experiments performed on both synthetic and real-world datasets.

READ FULL TEXT

page 1

page 10

page 11

research
06/29/2011

A Dirty Model for Multiple Sparse Regression

Sparse linear regression -- finding an unknown vector from linear measur...
research
05/26/2023

Feature Adaptation for Sparse Linear Regression

Sparse linear regression is a central problem in high-dimensional statis...
research
09/27/2014

Large-scale Online Feature Selection for Ultra-high Dimensional Sparse Data

Feature selection with large-scale high-dimensional data is important ye...
research
06/12/2023

DRCFS: Doubly Robust Causal Feature Selection

Knowing the features of a complex system that are highly relevant to a p...
research
03/08/2023

Optimal Sparse Recovery with Decision Stumps

Decision trees are widely used for their low computational cost, good pr...
research
01/21/2022

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

We study sparse linear regression over a network of agents, modeled as a...
research
05/25/2016

Efficient Distributed Learning with Sparsity

We propose a novel, efficient approach for distributed sparse learning i...

Please sign up or login with your details

Forgot password? Click here to reset