Anytime-Valid Confidence Sequences in an Enterprise A/B Testing Platform

02/20/2023
by   Akash V. Maharaj, et al.
0

A/B tests are the gold standard for evaluating digital experiences on the web. However, traditional "fixed-horizon" statistical methods are often incompatible with the needs of modern industry practitioners as they do not permit continuous monitoring of experiments. Frequent evaluation of fixed-horizon tests ("peeking") leads to inflated type-I error and can result in erroneous conclusions. We have released an experimentation service on the Adobe Experience Platform based on anytime-valid confidence sequences, allowing for continuous monitoring of the A/B test and data-dependent stopping. We demonstrate how we adapted and deployed asymptotic confidence sequences in a full featured A/B testing platform, describe how sample size calculations can be performed, and how alternate test statistics like "lift" can be analyzed. On both simulated data and thousands of real experiments, we show the desirable properties of using anytime-valid methods instead of traditional approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2022

Anytime-valid Confidence Intervals for Contingency Tables and Beyond

E variables are tools for designing tests that keep their type-I error g...
research
10/04/2022

Game-theoretic statistics and safe anytime-valid inference

Safe anytime-valid inference (SAVI) provides measures of statistical evi...
research
01/23/2023

Huber-Robust Confidence Sequences

Confidence sequences are confidence intervals that can be sequentially t...
research
10/16/2022

Anytime-Valid F-Tests for Faster Sequential Experimentation Through Covariate Adjustment

We introduce sequential F-tests and confidence sequences for subsets of ...
research
03/05/2020

Backward CUSUM for Testing and Monitoring Structural Change

It is well known that the conventional CUSUM test suffers from low power...
research
08/16/2022

Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection

eBay's experimentation platform runs hundreds of A/B tests on any given ...
research
11/13/2020

The Safe Log Rank Test: Error Control under Optional Stopping, Continuation and Prior Misspecification

We introduce the safe log rank test, a version of the log rank test that...

Please sign up or login with your details

Forgot password? Click here to reset