To Collaborate or Not in Distributed Statistical Estimation with Resource Constraints?

We study how the amount of correlation between observations collected by distinct sensors/learners affects data collection and collaboration strategies by analyzing Fisher information and the Cramer-Rao bound. In particular, we consider a simple setting wherein two sensors sample from a bivariate Gaussian distribution, which already motivates the adoption of various strategies, depending on the correlation between the two variables and resource constraints. We identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection strategy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on an estimate of the parameter of interest. We discuss two applications, IoT DDoS attack detection and distributed estimation in wireless sensor networks, that may benefit from our results.


page 1

page 2

page 3

page 4


Exploiting a Fleet of UAVs for Monitoring and Data Acquisition of a Distributed Sensor Network

This study proposes an efficient data collection strategy exploiting a t...

Online Debiasing for Adaptively Collected High-dimensional Data

Adaptive collection of data is increasingly commonplace in many applicat...

Data Aggregation Over Multiple Access Wireless Sensors Network

Data collection in Wireless Sensor Networks (WSN) draws significant atte...

"Playing the whole game": A data collection and analysis exercise with Google Calendar

We provide an exercise suitable for early introduction in an undergradua...

A Statistical Model with Qualitative Input

A statistical estimation model with qualitative input provides a mechani...

A note on estimation in a simple probit model under dependency

We consider a probit model without covariates, but the latent Gaussian v...

Household poverty classification in data-scarce environments: a machine learning approach

We describe a method to identify poor households in data-scarce countrie...