Comparing Broadband ISP Performance using Big Data from M-Lab

01/24/2021
by   Xiaohong Deng, et al.
0

Comparing ISPs on broadband speed is challenging, since measurements can vary due to subscriber attributes such as operation system and test conditions such as access capacity, server distance, TCP window size, time-of-day, and network segment size. In this paper, we draw inspiration from observational studies in medicine, which face a similar challenge in comparing the effect of treatments on patients with diverse characteristics, and have successfully tackled this using "causal inference" techniques for post facto analysis of medical records. Our first contribution is to develop a tool to pre-process and visualize the millions of data points in M-Lab at various time- and space-granularities to get preliminary insights on factors affecting broadband performance. Next, we analyze 24 months of data pertaining to twelve ISPs across three countries, and demonstrate that there is observational bias in the data due to disparities amongst ISPs in their attribute distributions. For our third contribution, we apply a multi-variate matching method to identify suitable cohorts that can be compared without bias, which reveals that ISPs are closer in performance than thought before. Our final contribution is to refine our model by developing a method for estimating speed-tier and re-apply matching for comparison of ISP performance. Our results challenge conventional rankings of ISPs, and pave the way towards data-driven approaches for unbiased comparisons of ISPs world-wide.

READ FULL TEXT

page 1

page 5

page 6

page 8

page 14

research
03/13/2017

Probabilistic Matching: Causal Inference under Measurement Errors

The abundance of data produced daily from large variety of sources has b...
research
10/06/2020

Using Experimental Data to Evaluate Methods for Observational Causal Inference

Methods that infer causal dependence from observational data are central...
research
01/21/2019

Estimating Residential Broadband Capacity using Big Data from M-Lab

Knowing residential broadband capacity profiles across a population is o...
research
07/20/2016

Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining

Big longitudinal observational databases present the opportunity to extr...
research
05/18/2022

Causal Effect Estimation for Multivariate Continuous Treatments

Causal inference is widely used in various fields, such as biology, psyc...
research
03/09/2022

Why Interpretable Causal Inference is Important for High-Stakes Decision Making for Critically Ill Patients and How To Do It

Many fundamental problems affecting the care of critically ill patients ...

Please sign up or login with your details

Forgot password? Click here to reset