Trustworthy Online Marketplace Experimentation with Budget-split Design
Online experimentation, also known as A/B testing, is the gold standard for measuring product impacts and making business decisions in the tech industry. The validity and utility of experiments, however, hinge on unbiasedness and sufficient power. In two-sided online marketplaces, both requirements are called into question. The Bernoulli randomized experiments are biased because treatment units interfere with control units through market competition and violate the "stable unit treatment value assumption"(SUTVA). The experimental power on at least one side of the market is often insufficient because of disparate sample sizes on the two sides. Despite the important of online marketplaces to the online economy and the crucial role experimentation plays in product improvement, there lacks an effective and practical solution to the bias and low power problems in marketplace experimentation. Our paper fills this gap by proposing an experimental design that is unbiased in any marketplace where buyers have a defined budget, which could be finite or infinite. We show that it is more powerful than all other unbiased designs in literature. We then provide generalizable system architecture for deploying this design to online marketplaces. Finally, we confirm our findings with empirical performance from experiments run in two real-world online marketplaces.
READ FULL TEXT