Log In Sign Up

Comparing Broadband ISP Performance using Big Data from M-Lab

by   Xiaohong Deng, et al.

Comparing ISPs on broadband speed is challenging, since measurements can vary due to subscriber attributes such as operation system and test conditions such as access capacity, server distance, TCP window size, time-of-day, and network segment size. In this paper, we draw inspiration from observational studies in medicine, which face a similar challenge in comparing the effect of treatments on patients with diverse characteristics, and have successfully tackled this using "causal inference" techniques for post facto analysis of medical records. Our first contribution is to develop a tool to pre-process and visualize the millions of data points in M-Lab at various time- and space-granularities to get preliminary insights on factors affecting broadband performance. Next, we analyze 24 months of data pertaining to twelve ISPs across three countries, and demonstrate that there is observational bias in the data due to disparities amongst ISPs in their attribute distributions. For our third contribution, we apply a multi-variate matching method to identify suitable cohorts that can be compared without bias, which reveals that ISPs are closer in performance than thought before. Our final contribution is to refine our model by developing a method for estimating speed-tier and re-apply matching for comparison of ISP performance. Our results challenge conventional rankings of ISPs, and pave the way towards data-driven approaches for unbiased comparisons of ISPs world-wide.


page 1

page 5

page 6

page 8

page 14


Probabilistic Matching: Causal Inference under Measurement Errors

The abundance of data produced daily from large variety of sources has b...

Using Experimental Data to Evaluate Methods for Observational Causal Inference

Methods that infer causal dependence from observational data are central...

Estimating Residential Broadband Capacity using Big Data from M-Lab

Knowing residential broadband capacity profiles across a population is o...

Distributed Design for Causal Inferences on Big Observational Data

A fundamental issue in causal inference for Big Observational Data is co...

Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining

Big longitudinal observational databases present the opportunity to extr...

Causal Effect Estimation for Multivariate Continuous Treatments

Causal inference is widely used in various fields, such as biology, psyc...