CONQ: CONtinuous Quantile Treatment Effects for Large-Scale Online Controlled Experiments

by   Weinan Wang, et al.

In many industry settings, online controlled experimentation (A/B test) has been broadly adopted as the gold standard to measure product or feature impacts. Most research has primarily focused on user engagement type metrics, specifically measuring treatment effects at mean (average treatment effects, ATE), and only a few have been focusing on performance metrics (e.g. latency), where treatment effects are measured at quantiles. Measuring quantile treatment effects (QTE) is challenging due to the myriad difficulties such as dependency introduced by clustered samples, scalability issues, density bandwidth choices, etc. In addition, previous literature has mainly focused on QTE at some pre-defined locations, such as P50 or P90, which doesn't always convey the full picture. In this paper, we propose a novel scalable non-parametric solution, which can provide a continuous range of QTE with point-wise confidence intervals while circumventing the density estimation altogether. Numerical results show high consistency with traditional methods utilizing asymptotic normality. An end-to-end pipeline has been implemented at Snap Inc., providing daily insights on key performance metrics at a distributional level.



There are no comments yet.


page 1

page 7


Large-Scale Online Experimentation with Quantile Metrics

Online experimentation (or A/B testing) has been widely adopted in indus...

How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled Experiments

Effectively measuring, understanding, and improving mobile app performan...

Semiparametric Estimation of Treatment Effects in Randomized Experiments

We develop new semiparametric methods for estimating treatment effects. ...

Covariance Estimation and its Application in Large-Scale Online Controlled Experiments

During the last few decades, online controlled experiments (also known a...

Interpreting Quantile Independence

How should one assess the credibility of assumptions weaker than statist...

Statistical evaluation of in-vivo bioassays in regulatory toxicology considering males and females

The separate evaluation for males and females is the recent standard in ...

Novelty and Primacy: A Long-Term Estimator for Online Experiments

Online experiments are the gold standard for evaluating impact on user e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.