How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled Experiments

by   Yuxiang Xie, et al.

Effectively measuring, understanding, and improving mobile app performance is of paramount importance for mobile app developers. Across the mobile Internet landscape, companies run online controlled experiments (A/B tests) with thousands of performance metrics in order to understand how app performance causally impacts user retention and to guard against service or app regressions that degrade user experiences. To capture certain characteristics particular to performance metrics, such as enormous observation volume and high skewness in distribution, an industry-standard practice is to construct a performance metric as a quantile over all performance events in control or treatment buckets in A/B tests. In our experience with thousands of A/B tests provided by Snap, we have discovered some pitfalls in this industry-standard way of calculating performance metrics that can lead to unexplained movements in performance metrics and unexpected misalignment with user engagement metrics. In this paper, we discuss two major pitfalls in this industry-standard practice of measuring performance for mobile apps. One arises from strong heterogeneity in both mobile devices and user engagement, and the other arises from self-selection bias caused by post-treatment user engagement changes. To remedy these two pitfalls, we introduce several scalable methods including user-level performance metric calculation and imputation and matching for missing metric values. We have extensively evaluated these methods on both simulation data and real A/B tests, and have deployed them into Snap's in-house experimentation platform.


What and How long: Prediction of Mobile App Engagement

User engagement is crucial to the long-term success of a mobile app. Sev...

Covariance Estimation and its Application in Large-Scale Online Controlled Experiments

During the last few decades, online controlled experiments (also known a...

CONQ: CONtinuous Quantile Treatment Effects for Large-Scale Online Controlled Experiments

In many industry settings, online controlled experimentation (A/B test) ...

Our Nudges, Our Selves: Tailoring Mobile User Engagement Using Personality

To increase mobile user engagement, current apps employ a variety of beh...

Automated Mobile App Test Script Intent Generation via Image and Code Understanding

Testing is the most direct and effective technique to ensure software qu...

Large-Scale Online Experimentation with Quantile Metrics

Online experimentation (or A/B testing) has been widely adopted in indus...

Lets Make A Story Measuring MR Child Engagement

We present the result of a pilot study measuring child engagement with t...

Please sign up or login with your details

Forgot password? Click here to reset